| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 372062
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is useful for targets which have prefetch instructions for non-default address spaces.
<rdar://problem/42662136>
Subscribers: nemanjai, javed.absar, hiraditya, kbarton, jkorous, dexonsmith, cfe-commits, llvm-commits, RKSimon, hfinkel, t.p.northover, craig.topper, anemet
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65254
llvm-svn: 367032
|
|
|
|
|
|
|
|
| |
the store as a <2 x float> instead of i64.
This is similar to what we do for loadl_pi and loadh_pi.
llvm-svn: 365669
|
|
|
|
|
|
|
|
|
|
| |
intrinsics when compiled for 32-bit mode.
All the command lines are for 64-bit mode, but sometimes I compile
the tests in 32-bit mode to see what assembly we get and we need
to skip these to do that.
llvm-svn: 365668
|
|
|
|
|
|
|
|
| |
Add secondary triple to existing SSE test for it. I audited other uses
of __attribute__((__packed__)) in the intrinsic headers, and this seemed
to be the only missing one.
llvm-svn: 353878
|
|
|
|
|
|
|
|
|
|
|
|
| |
is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k
Reviewed By: craig.topper
Subscribers: tkrupa, cfe-commits
Differential Revision: https://reviews.llvm.org/D41168
llvm-svn: 334850
|
|
|
|
|
|
|
|
| |
We don't need the insertion back into the original vector at the end. The builtin already understands that.
This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.
llvm-svn: 333572
|
|
|
|
|
|
|
| |
Following r333110:
"Move all Intel defined intrinsic includes into immintrin.h"
llvm-svn: 333160
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand.
This is not the same as LLVM's undef value which can take on multiple bit patterns.
There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.
Differential Revision: https://reviews.llvm.org/D30834
llvm-svn: 297588
|
|
|
|
|
|
|
|
|
|
|
|
| |
The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include
guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted
environment since it expects stdlib.h to be available - which is not the case
in these internal clang codegen tests).
This patch removes this hack and instead passes -ffreestanding to clang cc1.
Differential Revision: https://reviews.llvm.org/D24825
llvm-svn: 282581
|
|
|
|
|
|
| |
possible problems in headers.
llvm-svn: 277696
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
Differential Revision: https://reviews.llvm.org/D22105
llvm-svn: 276102
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the code comment, a potential follow-on would be to remove
the builtins themselves. Other than ord/unord, this already works as
expected. Eg:
typedef float v4sf __attribute__((__vector_size__(16)));
v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }
Differential Revision: http://reviews.llvm.org/D21268
llvm-svn: 272840
|
|
|
|
| |
llvm-svn: 271227
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20617
llvm-svn: 271219
|
|
|
|
|
|
|
|
| |
that will compile to a native unaligned store. Remove the builtins since they are no longer used.
Intrinsics will be removed from llvm in a future commit.
llvm-svn: 271214
|
|
|
|
| |
llvm-svn: 270679
|
|
|
|
|
|
|
|
| |
instrinsics. We were previously matching on other stores in the IR from this being an -O0 test.
We should probably look into making the storeu builtins just emit a normal store with an alignment of 1.
llvm-svn: 270664
|
|
|
|
|
|
| |
sse-builtins.c now just covers SSE1 intrinsics
llvm-svn: 270083
|
|
|
|
| |
llvm-svn: 269852
|
|
|
|
|
|
| |
intrinsic they're testing.
llvm-svn: 269735
|
|
|
|
|
|
| |
inline assembly to implement _mm_pause.
llvm-svn: 252712
|
|
|
|
|
|
| |
couple intrinsics that were supposed to operate on MMX registers. Otherwise we end up operating on GPRs. Throw in a test for _mm_mul_su32 while I was there.
llvm-svn: 252711
|
|
|
|
|
|
| |
Transferred SSSE3 instructions from sse-builtins.c
llvm-svn: 246948
|
|
|
|
|
|
| |
Transferred SSE41 instructions from sse-builtins.c
llvm-svn: 246947
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added missing SSE/AVX 'undefined' intrinsics (PR24040):
_mm_undefined_pd, _mm_undefined_ps + _mm_undefined_si128
_mm256_undefined_pd, _mm256_undefined_ps + _mm256_undefined_si256
_mm512_undefined, _mm512_undefined_ps, _mm512_undefined_pd + _mm512_undefined_epi32
Added builtin intrinsicss:
__builtin_ia32_undef128, __builtin_ia32_undef256 + __builtin_ia32_undef512
Differential Revision: http://reviews.llvm.org/D12052
llvm-svn: 246083
|
|
|
|
| |
llvm-svn: 245815
|
|
|
|
| |
llvm-svn: 230795
|
|
|
|
| |
llvm-svn: 229484
|
|
|
|
|
|
| |
instead of intrinsics. This should allow the instrinsics to removed from the backend.
llvm-svn: 229474
|
|
|
|
|
|
| |
that handles both.
llvm-svn: 229469
|
|
|
|
|
|
| |
that had optimizations on. This caused the check patterns to not quite match.
llvm-svn: 229073
|
|
|
|
|
|
| |
and _mm_srli_si128. This matches Intel documentation and gcc.
llvm-svn: 229066
|
|
|
|
|
|
|
|
|
|
| |
intrinsic files.
This still lower to the same intrinsics as before.
This is preparation for bounds checking the immediate on the avx version of the builtin so we don't pass illegal immediates into the backend. Since SSE uses a smaller size immediate its not possible to bounds check when using a shared builtin. Rather than creating a clang specific builtin for the different immediate, I decided (after consulting with Chandler) that it was better to match gcc.
llvm-svn: 224879
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instructions from __nodebug__ functions don't have file:line
information even when inlined into no-nodebug functions. As a result,
intrinsics (SSE and other) from <*intrin.h> clang headers _never_
have file:line information.
With this change, an instruction without !dbg metadata gets one from
the call instruction when inlined.
Fixes PR19001.
llvm-svn: 210459
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.
LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600
Reviewers: eli.friedman, craig.topper, rafael
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3601
llvm-svn: 208664
|
|
|
|
|
|
|
|
|
| |
Now, all extract & insert intrinsics should have the correct and operation
to ignore higher bits.
rdar://15250497
llvm-svn: 193267
|
|
|
|
|
|
|
| |
This is in line with implementation of _mm_extract_pi16.
rdar://15250497
llvm-svn: 193187
|
|
|
|
|
|
|
|
|
| |
While I'm here, also fix the alignment computation for the whole family of
intrinsics.
PR17298.
llvm-svn: 191243
|
|
|
|
|
|
| |
tests fail.
llvm-svn: 188447
|
|
|
|
|
|
|
| |
There intrinsics pass through the upper FP values from the input.
rdar://12558838
llvm-svn: 166743
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
goodness because it provides opportunites to cleanup things. For example,
uint64_t t1(__m128i vA)
{
uint64_t Alo;
_mm_storel_epi64((__m128i*)&Alo, vA);
return Alo;
}
was generating
movq %xmm0, -8(%rbp)
movq -8(%rbp), %rax
and now generates
movd %xmm0, %rax
rdar://11282581
llvm-svn: 155924
|
|
|
|
|
|
| |
parentheses around uses of vector macro arguments.
llvm-svn: 153732
|
|
|
|
| |
llvm-svn: 153726
|
|
|
|
|
|
|
|
| |
posix-unlike hosts.
Without -ffreestanding, clang tries to seek /usr/include/stdlib.h in host filesystem, even on Windows hosts.
llvm-svn: 139899
|
|
alignment (which probably has little effect in practice, but better to get it right). Make the load in _mm_loadh_pi and _mm_loadl_pi a single LLVM IR instruction to make optimizing easier for CodeGen.
rdar://10054986
llvm-svn: 139874
|