| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
to match gcc and Intel's documentation.
The avx2 gather intrinsics are documented to use 'int', 'long long', 'float', or 'double' *. So I'm leaving those. This matches gcc.
llvm-svn: 350696
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds
_mm_and_epi32, _mm_and_epi64
_mm_andnot_epi32, _mm_andnot_epi64
_mm_or_epi32, _mm_or_epi64
_mm_xor_epi32, _mm_xor_epi64
_mm256_and_epi32, _mm256_and_epi64
_mm256_andnot_epi32, _mm256_andnot_epi64
_mm256_or_epi32, _mm256_or_epi64
_mm256_xor_epi32, _mm256_xor_epi64
_mm_loadu_epi32, _mm_loadu_epi64
_mm_load_epi32, _mm_load_epi64
_mm256_loadu_epi32, _mm256_loadu_epi64
_mm256_load_epi32, _mm256_load_epi64
_mm512_loadu_epi32, _mm512_loadu_epi64
_mm512_load_epi32, _mm512_load_epi64
_mm_storeu_epi32, _mm_storeu_epi64
_mm_store_epi32, _mm_load_epi64
_mm256_storeu_epi32, _mm256_storeu_epi64
_mm256_store_epi32, _mm256_load_epi64
_mm512_storeu_epi32, _mm512_storeu_epi64
_mm512_store_epi32,V _mm512_load_epi64
llvm-svn: 344861
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead
Differential Revision: https://reviews.llvm.org/D48712
llvm-svn: 336643
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
|
| |
|
|
|
|
|
|
| |
that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.
llvm-svn: 336487
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.
Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8
llvm-svn: 336042
|
| |
|
|
|
|
| |
instead.
llvm-svn: 336036
|
| |
|
|
|
|
|
|
| |
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.
The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.
llvm-svn: 334330
|
| |
|
|
|
|
| |
checking.
llvm-svn: 334311
|
| |
|
|
|
|
|
|
| |
checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
|
| |
|
|
|
|
| |
target feature checking and immediate range checking.
llvm-svn: 334244
|
| |
|
|
|
|
|
|
| |
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.
I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.
llvm-svn: 334237
|
| |
|
|
|
|
|
|
|
|
| |
_m512(i|d)/_m256(i|d/_m128(i|d) first.
The majority of the cases were correct. This fixes the few that weren't.
I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts.
llvm-svn: 333615
|
| |
|
|
|
|
|
|
|
|
| |
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.
Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.
It also removed a bogus error message on another test.
llvm-svn: 333613
|
| |
|
|
|
|
|
|
|
|
| |
Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.
We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.
llvm-svn: 333568
|
| |
|
|
|
|
|
|
| |
builtin that returns void.
Found by running the intrinsic headers through -pedantic -ansi.
llvm-svn: 333563
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
|
| |
|
|
|
|
| |
to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
|
| |
|
|
|
|
| |
CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
llvm-svn: 333061
|
| |
|
|
|
|
|
|
|
|
|
|
| |
packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.
We already do something similar for scalar conversions.
Differential Revision: https://reviews.llvm.org/D46863
llvm-svn: 332882
|
| |
|
|
|
|
|
|
| |
in IR instead.
Someday maybe we'll use selects for all the builtins.
llvm-svn: 332825
|
| |
|
|
|
|
|
|
| |
the avx512 truncate builtins.
The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw.
llvm-svn: 332322
|
| |
|
|
|
|
|
|
|
|
| |
builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.
Differential Revision: https://reviews.llvm.org/D46742
llvm-svn: 332266
|
| |
|
|
|
|
| |
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.
llvm-svn: 330681
|
| |
|
|
|
|
| |
sse/avx buiiltins and a select.
llvm-svn: 326039
|
| |
|
|
|
|
| |
__builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead.
llvm-svn: 326022
|
| |
|
|
|
|
|
|
|
| |
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang
Differential Revision: https://reviews.llvm.org/D38737
llvm-svn: 318035
|
| |
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38672
Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
|
| |
|
|
|
|
|
|
| |
macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused.
This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp.
llvm-svn: 317506
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38231
Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57
llvm-svn: 314102
|
| |
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D37668
llvm-svn: 313624
|
| |
|
|
|
|
| |
Verified that the backend codegens this equally well.
llvm-svn: 292329
|
| |
|
|
|
|
| |
select and the avx unmasked builtins.
llvm-svn: 289338
|
| |
|
|
| |
llvm-svn: 287733
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.
This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.
Differential Revision: https://reviews.llvm.org/D26686
llvm-svn: 287088
|
| |
|
|
|
|
|
|
| |
unmasked builtins and a select.
This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking.
llvm-svn: 286757
|
| |
|
|
|
|
|
|
| |
element builtins over to the newly added unmasked builtins and a select.
This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend.
llvm-svn: 286714
|
| |
|
|
|
|
|
|
| |
shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.
llvm-svn: 285667
|
| |
|
|
| |
llvm-svn: 285545
|
| |
|
|
|
|
|
|
| |
shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.
llvm-svn: 285540
|
| |
|
|
|
|
| |
them with unmasked builtins and selects.
llvm-svn: 285539
|
| |
|
|
|
|
| |
unmasked builtins and a select.
llvm-svn: 285504
|
| |
|
|
|
|
| |
with unmasked builtins and a select.
llvm-svn: 285503
|
| |
|
|
|
|
| |
unmasked builtins with select instead.
llvm-svn: 285502
|
| |
|
|
|
|
| |
with selects and the older unmasked builtins.
llvm-svn: 284954
|
| |
|
|
| |
llvm-svn: 284927
|
| |
|
|
|
|
| |
and the older non-masked versions instead.
llvm-svn: 284924
|
| |
|
|
|
|
|
|
| |
builtins and replace with native operations.
We can't do the 512-bit ones because they take a rounding mode argument that we can't represent.
llvm-svn: 280635
|
| |
|
|
| |
llvm-svn: 280597
|
| |
|
|
| |
llvm-svn: 280596
|