| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.
The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.
llvm-svn: 334330
|
|
|
|
|
|
| |
checking.
llvm-svn: 334311
|
|
|
|
|
|
| |
These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec.
llvm-svn: 334310
|
|
|
|
|
|
| |
immediate range checking.
llvm-svn: 334266
|
|
|
|
|
|
| |
and immediate range checking.
llvm-svn: 334265
|
|
|
|
|
|
| |
This was caught by the Header tests, but not the CodeGen tests.
llvm-svn: 334264
|
|
|
|
|
|
|
|
| |
checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
|
|
|
|
|
|
| |
checking.
llvm-svn: 334256
|
|
|
|
|
|
| |
feature requirements and check immediate range.
llvm-svn: 334249
|
|
|
|
|
|
| |
target feature checking and immediate range checking.
llvm-svn: 334244
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support for these intrinsics, which are ARM and ARM64 only:
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.
llvm-svn: 334239
|
|
|
|
|
|
|
|
| |
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.
I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.
llvm-svn: 334237
|
|
|
|
|
|
|
|
|
|
| |
intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
llvm-svn: 334208
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.
I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.
The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.
I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.
Reviewers: tkrupa, RKSimon, spatel, GBuella
Reviewed By: tkrupa
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47724
llvm-svn: 334159
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47804
llvm-svn: 334108
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.
This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.
By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.
User code still has the option to use extended vector operations themselves if they need non-constant indexing.
llvm-svn: 334057
|
|
|
|
|
|
|
|
| |
feature as requiring both mmx and the sse feature.
Previously we only checked the sse feature, but this means that if you passed -mno-mmx, the builtins/intrinsics wouldn't be disabled in the frontend and would instead fail backend isel.
llvm-svn: 333980
|
|
|
|
|
|
|
|
|
|
|
| |
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.
Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.
llvm-svn: 333978
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
platforms."
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.
We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.
llvm-svn: 333958
|
|
|
|
|
|
|
|
| |
to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.
llvm-svn: 333944
|
|
|
|
|
|
|
|
| |
usages could just be undefined.
One of the arguments was being used when the passthru argument is unused due to the mask being all 1s. But in that case the actual value doesn't matter so we should use undef instead to avoid expanding the macro argument unnecessarily.
llvm-svn: 333865
|
|
|
|
| |
llvm-svn: 333858
|
|
|
|
|
|
| |
This is the correct way to say it takes no arguments in C.
llvm-svn: 333855
|
|
|
|
|
|
| |
__builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call.
llvm-svn: 333853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
Intel, so we did not have a consistent ABI.
This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).
Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects. The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.
llvm-svn: 333791
|
|
|
|
|
|
|
|
|
|
| |
around their __builtin function with appropriate arguments rather than just passing arguments to the masked intrinsic.
This is more consistent with all of our other avx512 macro intrinsics.
It also fixes a bad cast where an argument was casted to mmask8 when it should have been a mmask16.
llvm-svn: 333778
|
|
|
|
|
|
|
| |
This was missed in a few places in SVN r333613, causing compilation
errors if these macros are used e.g. as parameter to a function.
llvm-svn: 333734
|
|
|
|
|
|
|
|
| |
equivalents.
Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use.
llvm-svn: 333626
|
|
|
|
| |
llvm-svn: 333617
|
|
|
|
|
|
|
|
|
|
| |
_m512(i|d)/_m256(i|d/_m128(i|d) first.
The majority of the cases were correct. This fixes the few that weren't.
I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts.
llvm-svn: 333615
|
|
|
|
|
|
|
|
|
|
| |
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.
Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.
It also removed a bogus error message on another test.
llvm-svn: 333613
|
|
|
|
|
|
|
|
| |
Most of the origial comments used C style /* */ comments, but some C++ // comments had snuck in over time.
Still need to convert all the doxygen comments. Which is much harder to do.
llvm-svn: 333603
|
|
|
|
|
|
|
|
| |
fail if you run it through -pedantic -ansi.
All of these are lines that create a 'compound literal' to concatenate elements together.
llvm-svn: 333593
|
|
|
|
|
|
|
|
| |
We don't need the insertion back into the original vector at the end. The builtin already understands that.
This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.
llvm-svn: 333572
|
|
|
|
|
|
|
|
|
|
| |
Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.
We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.
llvm-svn: 333568
|
|
|
|
|
|
|
|
| |
builtin that returns void.
Found by running the intrinsic headers through -pedantic -ansi.
llvm-svn: 333563
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
|
|
|
|
| |
llvm-svn: 333515
|
|
|
|
| |
llvm-svn: 333509
|
|
|
|
|
|
|
|
| |
Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say.
This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions.
llvm-svn: 333497
|
|
|
|
|
|
| |
to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
|
|
|
|
|
|
|
|
| |
better use of other functions and to reduce width to 256 and 128 bits were possible."
This wasn't supposed to be commited yet.
llvm-svn: 333349
|
|
|
|
|
|
| |
This reduces from 12 builtins to 6 since we no longer need a mask and maskz version.
llvm-svn: 333348
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other functions and to reduce width to 256 and 128 bits were possible.
Summary:
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.
For v16i32 and floating point we have legacy 128/256 bit instructions we can use.
I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.
Reviewers: RKSimon, spatel, GBuella
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47401
llvm-svn: 333347
|
|
|
|
|
|
|
|
|
|
|
|
| |
An intrinsic for an old instruction, as described in the Intel SDM.
Reviewers: craig.topper, rnk
Reviewed By: craig.topper, rnk
Differential Revision: https://reviews.llvm.org/D47142
llvm-svn: 333256
|
|
|
|
| |
llvm-svn: 333211
|
|
|
|
|
|
| |
This is an AMD intrinsic not an Intel intrinsic so it shouldn't be in immintrin.h
llvm-svn: 333124
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Since clang r332929 these two headers throw errors when included from somewhere else than their wrapper header. It seems marking them as textual is the best way to fix the builds.
Fixes this new module build error:
While building module '_Builtin_intrinsics' imported from ...:
In file included from <module-includes>:2:
In file included from lib/clang/7.0.0/include/immintrin.h:54:
In file included from lib/clang/7.0.0/include/wmmintrin.h:29:
lib/clang/7.0.0/include/__wmmintrin_aes.h:25:2: error: "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead."
#error "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead."
Reviewers: rsmith, v.g.vassilev, craig.topper
Reviewed By: craig.topper
Subscribers: craig.topper, cfe-commits
Differential Revision: https://reviews.llvm.org/D47277
llvm-svn: 333123
|
|
|
|
|
|
|
|
|
|
| |
This matches the Intel documentation which shows them available by importing immintrin.h. x86intrin.h also includes immintrin.h so anyone including x86intrin.h will still get them.
This is different than gcc, but I don't think we were a perfect match there already. I'm unclear what gcc's policy is about how they choose which to add things to.
Differential Revision: https://reviews.llvm.org/D47182
llvm-svn: 333110
|
|
|
|
|
|
|
|
| |
(1) I added some \see cross-references to a few select intrinsics that are related (and have the same or similar semantics).
(2) pmmintrin.h, smmintrin.h, xmmintrin.h have very few minor formatting changes. They make rendering of our intrinsics documentation better.
llvm-svn: 333065
|