bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a ↵	Craig Topper	2018-01-23	3	-1058/+1036
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	movzx Summary: If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding. This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42313 llvm-svn: 323175
*	[X86] Remove 'NOREX' comment from the printing of _NOREX instructions.	Craig Topper	2018-01-23	9	-50/+50
\| \| \| \| \| \|	Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction. llvm-svn: 323174
*	Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵	Chandler Carruth	2018-01-22	3	-0/+536
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
*	[AMDGPU] SI Load Store Optimizer: When merging with offset, use ↵	Mark Searles	2018-01-22	2	-22/+82
\| \| \| \| \| \| \| \| \| \| \|	V_ADD_{I\|U}32_e64 - Change inserted add ( V_ADD_{I\|U}32_e32 ) to _e64 version ( V_ADD_{I\|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts. Differential Revision: https://reviews.llvm.org/D42124 llvm-svn: 323153
*	[mips] add warnings for using dsp and msa flags with inappropriate revisions	Petar Jovanovic	2018-01-22	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \|	Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding warnings for cases when these flags are used with earlier revision. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D40490 llvm-svn: 323131
*	[AArch64] optimise v4f16 fcmps to utilise vector instructions	Carey Williams	2018-01-22	1	-176/+85
\| \| \| \| \| \| \| \| \|	Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported. Generating FCTVL(s) rather than a longer series of FCVTs. Differential Revision: https://reviews.llvm.org/D41772 llvm-svn: 323118
*	[X86][AVX] Add test case for PR34370	Simon Pilgrim	2018-01-22	1	-0/+78
\| \| \| \|	llvm-svn: 323106
*	[X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)	Simon Pilgrim	2018-01-22	8	-270/+227
\| \| \| \| \| \| \| \|	Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering. Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644. llvm-svn: 323104
*	Break false dependencies for POPCNT, LZCNT, TZCNT	Marina Yatsina	2018-01-22	1	-0/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency. Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions. Update affected tests. This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 This is the final of multiple patches that fix this bugzilla. Most of the patches are intended at refactoring the existent code. Reviews of the refactoring done to enable this change: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 Differential Revision: https://reviews.llvm.org/D40334 Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d llvm-svn: 323096
*	Separate ExecutionDepsFix into 4 parts:	Marina Yatsina	2018-01-22	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087
*	[X86] Add an override of targetShrinkDemandedConstant to limit the damage ↵	Craig Topper	2018-01-20	4	-25/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that shrinkdemandedbits can do to zext_in_reg operations Summary: This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx. We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42265 llvm-svn: 323048
*	Move new test from Generic to SystemZ.	Jonas Paulsson	2018-01-20	1	-0/+0
\| \| \| \| \| \| \|	A few build bots failed with r323042 because they are not configured to build the SystemZ target. llvm-svn: 323044
*	[SelectionDAG] Fix codegen of vector stores with non byte-sized elements.	Jonas Paulsson	2018-01-20	8	-5900/+7229
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was completely broken, but hopefully fixed by this patch. In cases where it is needed, a vector with non byte-sized elements is stored by extracting, zero-extending, shift:ing and or:ing the elements into an integer of the same width as the vector, which is then stored. Review: Eli Friedman, Ulrich Weigand https://reviews.llvm.org/D42100#inline-369520 https://bugs.llvm.org/show_bug.cgi?id=35520 llvm-svn: 323042
*	[X86] Add some more v32i1 shuffle tests with shuffles between mask creation ↵	Craig Topper	2018-01-20	1	-0/+244
\| \| \| \| \| \| \| \| \| \| \| \|	and mask usage rather than being just shuffling input arguments. The existing tests just tested shuffles of v32i1 inputs, but arguments are promoted to v32i8. So it wasn't a good demonstration of v32i1 shuffle handling. The new test cases use compares and selects to get k-register operations around the shuffle. This is prep work for demonstrating changes from D42031. llvm-svn: 323031
*	[X86] Add test cases for failures to use movzx due to various issues with ↵	Craig Topper	2018-01-20	1	-0/+108
\| \| \| \| \| \| \| \|	demanded bits. D42265 and D42313 should help with some of these. llvm-svn: 323030
*	test: fix ARM tests harder	Saleem Abdulrasool	2018-01-20	1	-1/+0
\| \| \| \| \| \| \|	Remove the missed check update for the removal of the x86 specific vector call on ARM. llvm-svn: 323023
*	test: move ARM test from x86	Saleem Abdulrasool	2018-01-20	2	-1/+14
\| \| \| \| \| \| \|	The ARM backend is not guaranteed to be present on x86, move the test to the ARM tests. llvm-svn: 323021
*	CodeGen: handle llvm.used properly for COFF	Saleem Abdulrasool	2018-01-20	1	-0/+20
\| \| \| \| \| \| \| \| \|	`llvm.used` contains a list of pointers to named values which the compiler, assembler, and linker are required to treat as if there is a reference that they cannot see. Ensure that the symbols are preserved by adding an explicit `-include` reference to the linker command. llvm-svn: 323017
*	[X86] Teach X86 codegen to use vector width preference to avoid promoting to ↵	Craig Topper	2018-01-20	7	-0/+1258
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	512-bit types when VLX is enabled and the preference is for a smaller size. This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way. The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet. If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit. The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation. Differential Revision: https://reviews.llvm.org/D41895 llvm-svn: 323016
*	[x86] add tests for sqrt estimate that should respect denorms; NFC (PR34994)	Sanjay Patel	2018-01-19	1	-0/+67
\| \| \| \|	llvm-svn: 323003
*	[X86] Autogenerate complete checks on a couple tests. NFC	Craig Topper	2018-01-19	2	-46/+110
\| \| \| \|	llvm-svn: 322997
*	Add optional DICompileUnit to DIBuilder + make outliner debug info use it	Jessica Paquette	2018-01-19	1	-12/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the DIBuilder didn't expose functionality to set its compile unit in any other way than calling createCompileUnit. This meant that the outliner, which creates new functions, had to create a new compile unit for its debug info. This commit adds an optional parameter in the DIBuilder's constructor which lets you set its CU at construction. It also changes the MachineOutliner so that it keeps track of the DISubprograms for each outlined sequence. If debugging information is requested, then it uses one of the outlined sequence's DISubprograms to grab a CU. It then uses that CU to construct the DISubprogram for the new outlined function. The test has also been updated to reflect this change. See https://reviews.llvm.org/D42254 for more information. Also see the e-mail discussion on D42254 in llvm-commits for more context. llvm-svn: 322992
*	[SystemZ] Prefer LOCHI over generating IPM sequences	Ulrich Weigand	2018-01-19	3	-50/+42
\| \| \| \| \| \| \| \|	On current machines we have load-on-condition instructions that can be used to directly implement the SETCC semantics. If we have those, it is always preferable to use them instead of generating the IPM sequence. llvm-svn: 322989
*	[SystemZ] Directly use CC result of compare-and-swap	Ulrich Weigand	2018-01-19	5	-0/+287
\| \| \| \| \| \| \| \| \| \|	In order to implement a test whether a compare-and-swap succeeded, the SystemZ back-end currently emits a rather inefficient sequence of first converting the CC result into an integer, and then testing that integer against zero. This commit changes the back-end to simply directly test the CC value set by the compare-and-swap instruction. llvm-svn: 322988
*	[SystemZ] Rework IPM sequence generation	Ulrich Weigand	2018-01-19	3	-4/+250
\| \| \| \| \| \| \| \| \| \| \| \|	The SystemZ back-end uses a sequence of IPM followed by arithmetic operations to implement the SETCC primitive. This is currently done early during SelectionDAG. This patch moves generating those sequences to much later in SelectionDAG (during PreprocessISelDAG). This doesn't change much in generated code by itself, but it allows further enhancements that will be checked-in as follow-on commits. llvm-svn: 322987
*	[SystemZ] Run branch-12.ll test only if long tests enabled	Ulrich Weigand	2018-01-19	2	-1/+1
\| \| \| \| \| \|	This avoids excessive test run times e.g. with expensive checks enabled. llvm-svn: 322983
*	[X86][SSE] Add SSE2 gather tests	Simon Pilgrim	2018-01-19	1	-74/+156
\| \| \| \| \| \|	Check codegen without PEXTRD llvm-svn: 322974
*	[ARM] Fix perf regression in compare optimization.	Joel Galenson	2018-01-19	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \|	Fix a performance regression caused by r322737. While trying to make it easier to replace compares with existing adds and subtracts, I accidentally stopped it from doing so in some cases. This should fix that. I'm also fixing another potential bug in that commit. Differential Revision: https://reviews.llvm.org/D42263 llvm-svn: 322972
*	[WebAssembly] Fix libcall signature lookup	Derek Schuff	2018-01-19	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RuntimeLibcallSignatures previously manually initialized all the libcall names into an array and searched it linearly for the first match to lookup the corresponding index. r322802 switched that to initializing a map keyed by the libcall name. Neither of these approaches works correctly because some libcall numbers use the same name on different platforms (e.g. the "l" suffixed functions use f80 or f128 or ppcf128). This change fixes that by ensuring that each name only goes into the map once. It also adds tests. Differential Revision: https://reviews.llvm.org/D42271 llvm-svn: 322971
*	[WebAssembly] Make sign-extension opcodes a distinct feature.	Dan Gohman	2018-01-19	3	-12/+12
\| \| \| \| \| \| \| \| \| \|	Sign-extension opcodes have been split into a separate proposal from the main threads proposal, so switch them to their own target feature. See: https://github.com/WebAssembly/sign-extension-ops llvm-svn: 322966
*	Remove alignment argument from memcpy/memmove/memset in favour of alignment ↵	Daniel Neilson	2018-01-19	196	-1091/+1070
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])\((.), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8\(i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16\(i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32\(i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64\(i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128\(i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8\(i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16\(i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32\(i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64\(i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128\(i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8\(i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16\(i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32\(i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64\(i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128\(i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8\(i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16\(i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32\(i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64\(i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128\(i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965
*	[x86] add RUN line and auto-generate checks	Sanjay Patel	2018-01-19	1	-55/+182
\| \| \| \| \| \| \| \| \|	There were checks for a 32-bit target here, but no RUN line corresponding to that prefix. I don't know what the intent of these tests is, but at least now we can see what happens for both targets. llvm-svn: 322961
*	[x86] regenerate complete checks; NFC	Sanjay Patel	2018-01-19	1	-30/+51
\| \| \| \| \| \|	D42265 will improve something here, but it's not obvious how without more checks. llvm-svn: 322960
*	[x86] shrink 'and' immediate values by setting the high bits (PR35907)	Sanjay Patel	2018-01-19	11	-148/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Try to reverse the constant-shrinking that happens in SimplifyDemandedBits() for 'and' masks when it results in a smaller sign-extended immediate. We are also able to detect dead 'and' ops here (the mask is all ones). In that case, we replace and return without selecting the 'and'. Other targets might want to share some of this logic by enabling this under a target hook, but I didn't see diffs for simple cases with PowerPC or AArch64, so they may already have some specialized logic for this kind of thing or have different needs. This should solve PR35907: https://bugs.llvm.org/show_bug.cgi?id=35907 Differential Revision: https://reviews.llvm.org/D42088 llvm-svn: 322957
*	[X86] Extend load-op-store fusion merge to ADC/SBB.	Nirav Dave	2018-01-19	2	-2/+297
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42128 llvm-svn: 322952
*	[X86][AVX] Add more variable permute tests for source vectors smaller than ↵	Simon Pilgrim	2018-01-19	1	-8/+1060
\| \| \| \| \| \|	destination llvm-svn: 322948
*	Fix line endings. NFCI.	Simon Pilgrim	2018-01-19	1	-30/+30
\| \| \| \|	llvm-svn: 322940
*	[X86] Add KNL target to slow PMULLD tests	Simon Pilgrim	2018-01-19	1	-0/+2
\| \| \| \|	llvm-svn: 322939
*	[X86] Add RDPID schedule test	Simon Pilgrim	2018-01-19	1	-0/+20
\| \| \| \|	llvm-svn: 322938
*	[X86] Regenerate RDPMC intrinsic test	Simon Pilgrim	2018-01-19	1	-12/+16
\| \| \| \|	llvm-svn: 322937
*	Split TailDuplicatePass into pre- and post-RA variant; NFC	Matthias Braun	2018-01-19	1	-1/+1
\| \| \| \| \| \| \| \|	Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This avoids playing games with fake pass IDs and using MRI::isSSA() to determine pre-/post-RA state. llvm-svn: 322926
*	Move tests to the correct place	Matthias Braun	2018-01-19	19	-0/+0
\| \| \| \| \| \| \|	test/CodeGen/MIR is for testing the MIR parser/printer. Tests for passes and targets belong to test/CodeGen/TARGETNAME. llvm-svn: 322925
*	Revert [CGP] Re-enable Select in complex addressing mode	Serguei Katkov	2018-01-19	1	-1/+1
\| \| \| \| \| \|	One of buildbots failed. Revert for now till fix the issue. llvm-svn: 322923
*	AArch64: Fix emergency spillslot being out of reach for large callframes	Matthias Braun	2018-01-19	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-commit of r322200: The testcase shouldn't hit machineverifiers anymore with r322917 in place. Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322919
*	AArch64: Omit callframe setup/destroy when not necessary	Matthias Braun	2018-01-19	7	-62/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to setup and the callframe size is 0. - Fixes an invalid callframe nesting for byval arguments, which would look like this before this patch (as in `big-byval.ll`): ... ADJCALLSTACKDOWN 32768, 0, ... # Setup for extfunc ... ADJCALLSTACKDOWN 0, 0, ... # setup for memcpy ... BL &memcpy ... ADJCALLSTACKUP 0, 0, ... # destroy for memcpy ... BL &extfunc ADJCALLSTACKUP 32768, 0, ... # destroy for extfunc - Saves us two instructions in the common case of zero-sized stackframes. - Remove an unnecessary scheduling barrier (hence the small unittest changes). Differential Revision: https://reviews.llvm.org/D42006 llvm-svn: 322917
*	[X86] Add intrinsic support for the RDPID instruction	Craig Topper	2018-01-18	1	-0/+21
\| \| \| \| \| \| \| \|	This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg. Differential Revision: https://reviews.llvm.org/D42205 llvm-svn: 322910
*	AMDGPU/SI: Add d16 support for image intrinsics.	Changpeng Fang	2018-01-18	3	-0/+397
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903
*	[test] Actually check the common parts in ↵	Martin Storsjo	2018-01-18	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	CodeGen/ARM/global-merge-external.ll. NFC. Previously, these parts weren't ever checked. The label patterns need to be extended to match successfully on macho. Differential Revision: https://reviews.llvm.org/D42126 llvm-svn: 322900
*	[AArch64][GlobalISel] Add isel support for global values in the large code ↵	Amara Emerson	2018-01-18	1	-0/+61
\| \| \| \| \| \| \| \| \| \|	model. Fixes PR35958. Differential Revision: https://reviews.llvm.org/D42175 llvm-svn: 322878
*	[X86][SSE] Regenerate vector promotion tests	Simon Pilgrim	2018-01-18	1	-19/+46
\| \| \| \|	llvm-svn: 322877