bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic ↵	Craig Topper	2018-08-28	1	-6/+13
\| \| \| \| \| \| \| \|	names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. llvm-svn: 340798
*	[X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and ↵	Craig Topper	2018-08-27	1	-12/+32
\| \| \| \| \| \| \| \| \| \|	64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. llvm-svn: 340719
*	Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr().	Nico Weber	2018-08-21	1	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \|	EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that work again. This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice previously. This change fixes that bug. https://reviews.llvm.org/D50979 llvm-svn: 340348
*	[CodeGen] add rotate builtins that map to LLVM funnel shift	Sanjay Patel	2018-08-19	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing) with 1 change: Remove the changes to make microsoft builtins also use the LLVM intrinsics. This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: https://reviews.llvm.org/D49242 With improved codegen in: https://reviews.llvm.org/rL337966 https://reviews.llvm.org/rL339359 And basic IR optimization added in: https://reviews.llvm.org/rL338218 https://reviews.llvm.org/rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 llvm-svn: 340141
*	revert r340137: [CodeGen] add rotate builtins	Sanjay Patel	2018-08-19	1	-37/+40
\| \| \| \| \| \|	At least a couple of bots (gcc host compiler on PPC only?) are showing the compiler dying while trying to compile. llvm-svn: 340138
*	[CodeGen] add/fix rotate builtins that map to LLVM funnel shift (retry)	Sanjay Patel	2018-08-19	1	-40/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing) with 2 changes: 1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash). 2. The original commit had a formatting bug in the docs (missing an underscore). Original commit message: This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: https://reviews.llvm.org/D49242 With improved codegen in: https://reviews.llvm.org/rL337966 https://reviews.llvm.org/rL339359 And basic IR optimization added in: https://reviews.llvm.org/rL338218 https://reviews.llvm.org/rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 llvm-svn: 340137
*	revert r340135: [CodeGen] add rotate builtins	Sanjay Patel	2018-08-19	1	-37/+40
\| \| \| \| \| \| \| \|	At least a couple of bots (PPC only?) are showing the compiler dying while trying to compile: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/11065/steps/build%20stage%201/logs/stdio http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/18267/steps/build%20stage%201/logs/stdio llvm-svn: 340136
*	[CodeGen] add rotate builtins	Sanjay Patel	2018-08-19	1	-40/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: D49242 With improved codegen in: rL337966 rL339359 And basic IR optimization added in: rL338218 rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 llvm-svn: 340135
*	Make __shiftleft128 / __shiftright128 real compiler built-ins.	Nico Weber	2018-08-17	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \|	r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 llvm-svn: 340048
*	[X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select ↵	Craig Topper	2018-08-16	1	-10/+4
\| \| \| \| \| \|	builtin instead. llvm-svn: 339845
*	[X86] Lowering addus/subus intrinsics to native IR	Tomasz Krupa	2018-08-14	1	-1/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D46892 llvm-svn: 339651
*	Port getLocStart -> getBeginLoc	Stephen Kelly	2018-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: teemperor! Subscribers: jholewinski, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D50350 llvm-svn: 339385
*	[Builtins] Implement __builtin_clrsb to be compatible with gcc	Craig Topper	2018-08-08	1	-0/+20
\| \| \| \| \| \| \| \| \| \|	gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros. This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2. Differential Revision: https://reviews.llvm.org/D50168 llvm-svn: 339282
*	[OpenCL] Restore r338899 (reverted in r338904), fixing stack-use-after-return	Scott Linder	2018-08-07	1	-21/+35
\| \| \| \| \| \| \| \| \|	Always emit alloca in entry block for enqueue_kernel builtin. Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. llvm-svn: 339150
*	Revert "[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin"	Vlad Tsyrklevich	2018-08-03	1	-33/+20
\| \| \| \| \| \|	This reverts commit r338899, it was causing ASan test failures on sanitizer-x86_64-linux-fast. llvm-svn: 338904
*	[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin	Scott Linder	2018-08-03	1	-20/+33
\| \| \| \| \| \| \| \| \|	Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. Differential Revision: https://reviews.llvm.org/D50104 llvm-svn: 338899
*	[WebAssembly] Support for atomic.wait / atomic.wake builtins	Heejin Ahn	2018-08-02	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for atomic.wait / atomic.wake builtins based on the Wasm thread proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D49396 llvm-svn: 338771
*	Try to make builtin address space declarations not useless	Matt Arsenault	2018-08-02	1	-44/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". llvm-svn: 338707
*	Remove trailing space	Fangrui Song	2018-07-30	1	-3/+3
\| \| \| \| \| \|	sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338291
*	[NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics	Ivan A. Kosarev	2018-07-23	1	-6/+13
\| \| \| \| \| \| \| \| \| \|	This patch adds support for vrndi_f32() and vrndiq_f32() intrinsics in AArch32 mode and for vrndns_f32() intrinsic in AArch64 mode. Differential Revision: https://reviews.llvm.org/D48829 llvm-svn: 337690
*	Implement cpu_dispatch/cpu_specific Multiversioning	Erich Keane	2018-07-20	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
*	Change \t to spaces	Fangrui Song	2018-07-20	1	-4/+4
\| \| \| \|	llvm-svn: 337530
*	NFC: Remove extraneous semicolons as pointed out in the differential review	Nemanja Ivanovic	2018-07-19	1	-2/+2
\| \| \| \| \| \| \| \|	The commit for https://reviews.llvm.org/D49424 missed the comment about the extraneous semicolons. Remove them. llvm-svn: 337451
*	[PowerPC] Handle __builtin_xxpermdi the same way as GCC does	Nemanja Ivanovic	2018-07-19	1	-13/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	The codegen for this builtin was initially implemented to match GCC. However, due to interest from users GCC changed behaviour to account for the big endian bias of the instruction and correct it. This patch brings the handling inline with GCC. Fixes https://bugs.llvm.org/show_bug.cgi?id=38192 Differential Revision: https://reviews.llvm.org/D49424 llvm-svn: 337449
*	[COFF] Add more missing MSVC ARM64 intrinsics	Mandeep Singh Grang	2018-07-17	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 llvm-svn: 337327
*	[X86] Remove custom handling for __builtin_ia32_divss_round_mask and ↵	Craig Topper	2018-07-10	1	-24/+0
\| \| \| \| \| \|	__builtin_ia32_divsd_round_mask. llvm-svn: 336628
*	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that ↵	Craig Topper	2018-07-10	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622
*	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector ↵	Craig Topper	2018-07-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583
*	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.	Craig Topper	2018-07-08	1	-17/+58
\| \| \| \| \| \|	This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma llvm-svn: 336507
*	[X86] When creating a select for scalar masked sqrt and div builtins make ↵	Craig Topper	2018-07-06	1	-12/+19
\| \| \| \| \| \| \| \| \| \|	sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. llvm-svn: 336472
*	[X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with ↵	Craig Topper	2018-07-06	1	-0/+10
\| \| \| \| \| \| \| \|	native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. llvm-svn: 336417
*	[X86] Use shufflevector instead of a select with a constant mask for ↵	Craig Topper	2018-07-05	1	-8/+4
\| \| \| \| \| \| \| \| \| \|	fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388
*	[X86] Fix some vector cmp builtins - TRUE/FALSE predicates	Gabor Buella	2018-07-05	1	-37/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355
*	[X86] Remove masking from the avx512 packed sqrt builtins. Use select ↵	Craig Topper	2018-06-29	1	-16/+14
\| \| \| \| \| \|	builtins instead. llvm-svn: 335945
*	[X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the ↵	Craig Topper	2018-06-27	1	-6/+6
\| \| \| \| \| \|	name to match llvm. llvm-svn: 335745
*	[NEON] Support vldNq intrinsics in AArch32 (Clang part)	Ivan A. Kosarev	2018-06-27	1	-63/+16
\| \| \| \| \| \| \| \| \|	This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 llvm-svn: 335734
*	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and ↵	Craig Topper	2018-06-26	1	-0/+37
\| \| \| \| \| \| \| \|	implement the mask input argument using an 'and' IR instruction. Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes. llvm-svn: 335564
*	[X86] Lower _mm[256\|512]_cmp[.]_mask intrinsics to native llvm IR	Gabor Buella	2018-06-22	1	-91/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
*	[X86] Update handling in CGBuiltin to be tolerant of out of range immediates.	Craig Topper	2018-06-21	1	-13/+29
\| \| \| \| \| \| \| \|	D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308
*	Fix a bug introduced by rL334850	Tomasz Krupa	2018-06-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: All *_sqrt_round_s[s\|d] intrinsics should execute a square root on zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around. Reviewers: itaraban, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D48288 llvm-svn: 334964
*	[X86] Lowering sqrt intrinsics to native IR	Tomasz Krupa	2018-06-15	1	-1/+50
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
*	[AArch64] Reverted rC334696 with Clang VCVTA test fix	Luke Geeson	2018-06-15	1	-0/+3
\| \| \| \|	llvm-svn: 334820
*	[X86] Rename __builtin_ia32_pslldqi128 to ↵	Craig Topper	2018-06-14	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \|	__builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773
*	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part)	Tomasz Krupa	2018-06-14	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741
*	[AArch64] reverting rC334693 due to build failures	Luke Geeson	2018-06-14	1	-3/+0
\| \| \| \|	llvm-svn: 334696
*	[AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A	Luke Geeson	2018-06-14	1	-0/+3
\| \| \| \|	llvm-svn: 334693
*	[COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl	Mandeep Singh Grang	2018-06-13	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility. Reviewers: mstorsjo, compnerd, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D48132 llvm-svn: 334639
*	[X86] Fix operand order in the shuffle created for blend builtins.	Craig Topper	2018-06-11	1	-1/+1
\| \| \| \| \| \|	This was broken when the builtin was added in r334249. llvm-svn: 334422
*	[X86] Use target independent masked expandload and compressstore intrinsics ↵	Craig Topper	2018-06-10	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366
*	[NEON] Support VST1xN intrinsics in AArch32 mode (Clang part)	Ivan A. Kosarev	2018-06-10	1	-28/+29
\| \| \| \| \| \| \| \| \|	We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47446 llvm-svn: 334362