| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.
This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.
I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.
Differential Revision: https://reviews.llvm.org/D60674
llvm-svn: 358427
|
|
|
|
|
|
| |
As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics.
llvm-svn: 350619
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics (clang)
This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics.
LLVM counterpart: https://reviews.llvm.org/D55894
Differential Revision: https://reviews.llvm.org/D55890
llvm-svn: 349743
|
|
|
|
|
|
|
|
|
|
| |
generic intrinsics (clang)
Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble.
Differential Revision: https://reviews.llvm.org/D55879
llvm-svn: 349631
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46892
llvm-svn: 339651
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.
Affected AVX512 builtins:
__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: craig.topper, spatel, efriedma
Differential Revision: https://reviews.llvm.org/D45616
llvm-svn: 335339
|
|
|
|
|
|
| |
checking.
llvm-svn: 334311
|
|
|
|
|
|
| |
and immediate range checking.
llvm-svn: 334265
|
|
|
|
|
|
|
|
| |
checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
|
|
|
|
|
|
| |
feature requirements and check immediate range.
llvm-svn: 334249
|
|
|
|
|
|
|
|
|
|
| |
intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
llvm-svn: 334208
|
|
|
|
|
|
|
| |
Following r333110:
"Move all Intel defined intrinsic includes into immintrin.h"
llvm-svn: 333160
|
|
|
|
|
|
|
|
| |
The LLVM commit introduces a crash in LLVM's instruction selection.
I filed http://llvm.org/PR37260 with the test case.
llvm-svn: 330997
|
|
|
|
|
|
|
|
|
|
|
| |
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D44786
llvm-svn: 330323
|
|
|
|
|
|
|
|
| |
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.
Differential Revision: https://reviews.llvm.org/D45421
llvm-svn: 329605
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the perm2f128 intrinsics with native shuffle vectors.
This uses a pretty simple approach to allocate source 0 to the lower half input and source 1 to the upper half input. Then its just a matter of filling in the indices to use either the lower or upper half of that specific source. This can result in the same source being used by both operands. InstCombine or SelectionDAGBuilder should be able to clean that up.
Differential Revision: https://reviews.llvm.org/D37892
llvm-svn: 313418
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D37694
llvm-svn: 313133
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37562
llvm-svn: 313011
|
|
|
|
|
|
|
|
|
|
| |
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
LLVM companion patch: D31767.
Differential Revision: https://reviews.llvm.org/D31766
llvm-svn: 300326
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand.
This is not the same as LLVM's undef value which can take on multiple bit patterns.
There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.
Differential Revision: https://reviews.llvm.org/D30834
llvm-svn: 297588
|
|
|
|
|
|
|
|
|
|
|
|
| |
The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include
guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted
environment since it expects stdlib.h to be available - which is not the case
in these internal clang codegen tests).
This patch removes this hack and instead passes -ffreestanding to clang cc1.
Differential Revision: https://reviews.llvm.org/D24825
llvm-svn: 282581
|
|
|
|
|
|
| |
possible problems in headers.
llvm-svn: 277696
|
|
|
|
|
|
| |
makes them the same as what is done when using the SSE builtins for these same encodings.
llvm-svn: 274608
|
|
|
|
|
|
| |
when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z.
llvm-svn: 274525
|
|
|
|
|
|
|
| |
Sibling patch to r272932:
http://reviews.llvm.org/rL272932
llvm-svn: 272933
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the code comment, a potential follow-on would be to remove
the builtins themselves. Other than ord/unord, this already works as
expected. Eg:
typedef float v4sf __attribute__((__vector_size__(16)));
v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }
Differential Revision: http://reviews.llvm.org/D21268
llvm-svn: 272840
|
|
|
|
|
|
| |
directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well.
llvm-svn: 272246
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR (clang)
The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics.
This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics.
Note: We already did this for SSE41 PMOVSX sometime ago.
Differential Revision: http://reviews.llvm.org/D20684
llvm-svn: 271106
|
|
|
|
|
|
| |
gathers
llvm-svn: 270833
|
|
|
|
|
|
| |
llvm/test/CodeGen/X86/avx2-intrinsics-fast-isel.ll will be synced to this
llvm-svn: 270708
|
|
|
|
|
|
|
|
| |
As discussed on the ml, backend tests need to be put in llvm/test/CodeGen/X86 as fast-isel tests using IR that is as close to what is generated here as possible.
The llvm tests will (re)added in a future commit.
llvm-svn: 255050
|
|
|
|
|
|
| |
index input.
llvm-svn: 254270
|
|
|
|
|
|
| |
about optimization options.
llvm-svn: 250271
|
|
|
|
|
|
|
|
|
|
| |
Per Intel intrinsics guide:
- _mm256_stream_load_si256 takes `__m256i const *'
- _mm_stream_load_si128 takes `__m128i *', for no good reason.
Let's accept const* for both.
llvm-svn: 249213
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently FastISel doesn't know how to select vector bitcasts.
During instruction selection, fast-isel always falls back to SelectionDAG
every time it encounters a vector bitcast.
As a consequence of this, all the 'packed vector shift by immedate count'
test cases in avx2-builtins.c are optimized by the DAGCombiner.
In particular, the DAGCombiner would always fold trivial stack loads of
constant shift counts into the operands of packed shift builtins.
This behavior would start changing as soon as I reapply revision 249121.
That revision would teach x86 fast-isel how to select bitcasts between vector
types of the same size.
As a consequence of that change, fast-isel would less often fall back to
SelectionDAG. More importantly, DAGCombiner would no longer be able to
simplify the code by folding the stack reload of a constant.
No functional change.
llvm-svn: 249142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
test that our intrinsics behave the same under -fsigned-char and
-funsigned-char.
This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit
elements, and has for a long time. This is fixed in the same way as
SSE4 handles the case.
The other ISA extensions currently work correctly because they use
specific instruction intrinsics. As soon as they are rewritten in terms
of generic IR, they will need to add these special casts. I've added the
necessary testing to catch this however, so we shouldn't have to chase
it down again.
I considered changing the core typedef to be signed, but that seems like
a bad idea. Notably, it would be an ABI break if anyone is reaching into
the innards of the intrinsic headers and passing __v16qi on an API
boundary. I can't be completely confident that this wouldn't happen due
to a macro expanding in a lambda, etc., so it seems much better to leave
it alone. It also matches GCC's behavior exactly.
A fun side note is that for both GCC and Clang, -funsigned-char really
does change the semantics of __v16qi. To observe this, consider:
% cat x.cc
#include <smmintrin.h>
#include <iostream>
int main() {
__v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
__v16qi b = _mm_set1_epi8(-1);
std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n';
}
% clang++ -o x x.cc && ./x
-1, 1
% clang++ -funsigned-char -o x x.cc && ./x
0, 1
However, while this may be surprising, both Clang and GCC agree.
Differential Revision: http://reviews.llvm.org/D13324
llvm-svn: 249097
|
|
|
|
| |
llvm-svn: 245992
|
|
|
|
|
|
|
|
|
|
|
|
| |
We agreed for r245605 that, as long as we don't affect -O0 codegen
too much, it's OK to use native constructs rather than intrinsics.
Let's test that, starting with AVX2 here.
See PR24580.
Differential Revision: http://reviews.llvm.org/D12212
llvm-svn: 245987
|
|
|
|
|
|
|
|
|
|
| |
This lets us optimize them better. We agreed to remove the intrinsics,
instead of combining them later, as, at -O0, we generate the expected
instructions. Plus, it's a nice cleanup.
Differential Revision: http://reviews.llvm.org/D10556
llvm-svn: 245605
|
|
|
|
|
|
| |
_mm_broadcastsd_pd is basically an alias for _mm_movedup_pd, however the alias is only available from AVX2 forward.
llvm-svn: 237698
|
|
|
|
|
|
| |
These two intrinsics are alternative names for _mm256_slli_si256 and _mm256_srli_si256, respectively.
llvm-svn: 237693
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is nearly identical to the v*f128_si256 parts of r231792 and r232052.
AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.
This should complete the front end fixes for insert/extract128 intrinsics.
Corresponding LLVM patch to follow.
llvm-svn: 232109
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally we were using the same GCC builtins to lower this AVX2 vector
intrinsic. Instead we will now lower it directly to a vector shuffle.
This will not only allow LLVM to generate better code, but it will also allow us
to remove the GCC intrinsics.
Reviewed by Andrea
This is related to rdar://problem/18742778.
llvm-svn: 231081
|
|
|
|
| |
llvm-svn: 229484
|
|
|
|
|
|
| |
instead of intrinsics. This should allow the instrinsics to removed from the backend.
llvm-svn: 229474
|
|
|
|
|
|
| |
__builtin_ia32_pslldqi256 to vector shuffles the backend recognizes. This is a step towards removing the corresponding intrinsics from the backend.
llvm-svn: 229348
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
made the 8-bit masks actually 8-bit arguments to these intrinsics.
These builtins are a mess. Many were missing the I qualifier which
I added where obviously correct. Most aren't tested, but I've updated
the relevant tests. I've tried to catch all the things that should
become 'c' in this round.
It's also frustrating because the set of these is really ad-hoc and
doesn't really map that cleanly to the set supported by either GCC or
LLVM. Oh well...
llvm-svn: 217311
|
|
|
|
| |
llvm-svn: 208665
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.
LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600
Reviewers: eli.friedman, craig.topper, rafael
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3601
llvm-svn: 208664
|
|
|
|
| |
llvm-svn: 207062
|