| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
corresponding tests.
llvm-svn: 372063
|
|
|
|
|
|
|
|
|
|
| |
intrinsics when compiled for 32-bit mode.
All the command lines are for 64-bit mode, but sometimes I compile
the tests in 32-bit mode to see what assembly we get and we need
to skip these to do that.
llvm-svn: 365668
|
|
|
|
|
|
|
|
| |
intrinsic tests that uses packed structs to control alignment.
r353878 fixed a bug in _mm_loadu_ps and added a command line to catch it. Adding additional command lines to prevent breaking other intrinsics in the future.
llvm-svn: 353887
|
|
|
|
|
|
|
|
| |
that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.
llvm-svn: 336487
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask
Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.
llvm-svn: 336346
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.
Affected AVX512 builtins:
__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: craig.topper, spatel, efriedma
Differential Revision: https://reviews.llvm.org/D45616
llvm-svn: 335339
|
|
|
|
|
|
|
|
| |
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.
This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.
llvm-svn: 335308
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k
Reviewed By: craig.topper
Subscribers: tkrupa, cfe-commits
Differential Revision: https://reviews.llvm.org/D41168
llvm-svn: 334850
|
|
|
|
|
|
|
|
| |
checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
|
|
|
|
|
|
| |
checking.
llvm-svn: 334256
|
|
|
|
|
|
| |
feature requirements and check immediate range.
llvm-svn: 334249
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.
This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.
By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.
User code still has the option to use extended vector operations themselves if they need non-constant indexing.
llvm-svn: 334057
|
|
|
|
|
|
|
|
|
|
| |
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.
Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.
It also removed a bogus error message on another test.
llvm-svn: 333613
|
|
|
|
|
|
|
| |
Following r333110:
"Move all Intel defined intrinsic includes into immintrin.h"
llvm-svn: 333160
|
|
|
|
|
|
|
|
|
|
|
|
| |
packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.
We already do something similar for scalar conversions.
Differential Revision: https://reviews.llvm.org/D46863
llvm-svn: 332882
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the perm2f128 intrinsics with native shuffle vectors.
This uses a pretty simple approach to allocate source 0 to the lower half input and source 1 to the upper half input. Then its just a matter of filling in the indices to use either the lower or upper half of that specific source. This can result in the same source being used by both operands. InstCombine or SelectionDAGBuilder should be able to clean that up.
Differential Revision: https://reviews.llvm.org/D37892
llvm-svn: 313418
|
|
|
|
| |
llvm-svn: 305551
|
|
|
|
| |
llvm-svn: 301749
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand.
This is not the same as LLVM's undef value which can take on multiple bit patterns.
There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.
Differential Revision: https://reviews.llvm.org/D30834
llvm-svn: 297588
|
|
|
|
|
|
|
|
|
|
|
|
| |
The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include
guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted
environment since it expects stdlib.h to be available - which is not the case
in these internal clang codegen tests).
This patch removes this hack and instead passes -ffreestanding to clang cc1.
Differential Revision: https://reviews.llvm.org/D24825
llvm-svn: 282581
|
|
|
|
| |
llvm-svn: 278208
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constraints were added to _mm256_broadcast_{pd,ps} intel intrinsics.
The spec for these intrinics is ... pretty much silent on alignment.
This is especially frustrating considering the amount of discussion of
alignment in the load and store instrinsics. So I was forced to rely on
the specification for the VBROADCASTF128 instruction.
That instruction's spec is *also* completely silent on alignment.
Fortunately, when it comes to the instruction's spec, silence is enough.
There is no #GP fault option for an underaligned address so this
instruction, and by inference the intrinsic, can read any alignment.
As it happens, the old code worked exactly this way and in fact we have
plenty of code that hands pointers with less than 16-byte alignment to
these intrinsics. This code broke pretty spectacularly with this commit.
Fortunately, the fix is super simple! Change a 16 to a 1, and ta da!
Anyways, a lot of debugging for a really boring fix. =]
llvm-svn: 278202
|
|
|
|
|
|
| |
possible problems in headers.
llvm-svn: 277696
|
|
|
|
|
|
|
|
|
|
| |
generic IR
As discussed on D22460, I've updated the vbroadcastf128 pd256/ps256 builtins to map directly to generic IR - load+splat a 128-bit vector to both lanes of a 256-bit vector.
Fix for PR28657.
llvm-svn: 276417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
Differential Revision: https://reviews.llvm.org/D22105
llvm-svn: 276102
|
|
|
|
|
|
| |
when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z.
llvm-svn: 274525
|
|
|
|
|
|
|
|
|
|
|
|
| |
f32/f64 to i32 with generic IR (clang)
The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents.
Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds).
Differential Revision: http://reviews.llvm.org/D20859
llvm-svn: 271436
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
state reading intrinsics)
Adding LLVM front-end support to two intrinsics dealing with bit scan: _bit_scan_forward and _bit_scan_reverse.
Their functionality is as described in Intel intrinsics guide:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_forward&expand=371,370
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_reverse&expand=371,370
Furthermore, adding clang front-end support to these conversion intrinsics: _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32.
Finally, adding tests to all of the above, as well as to the state reading intrinsics _rdpmc and _rdtsc.
Their functionality is also specified in the Intel intrinsics guide.
Commit on behalf of Omer Paparo Bivas
llvm-svn: 271387
|
|
|
|
| |
llvm-svn: 271227
|
|
|
|
|
|
|
|
| |
that will compile to a native unaligned store. Remove the builtins since they are no longer used.
Intrinsics will be removed from llvm in a future commit.
llvm-svn: 271214
|
|
|
|
|
|
|
|
|
|
| |
Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.
Differential Revision: http://reviews.llvm.org/D20528
llvm-svn: 270499
|
|
|
|
|
|
|
|
|
|
| |
Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16.
Fix for PR27594
Differential Revision: http://reviews.llvm.org/D20468
llvm-svn: 270330
|
|
|
|
|
|
| |
tests
llvm-svn: 270227
|
|
|
|
| |
llvm-svn: 270212
|
|
|
|
|
|
| |
llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll will be synced to this
llvm-svn: 270210
|
|
|
|
| |
llvm-svn: 269932
|
|
|
|
|
|
| |
about optimization options.
llvm-svn: 250271
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added missing SSE/AVX 'undefined' intrinsics (PR24040):
_mm_undefined_pd, _mm_undefined_ps + _mm_undefined_si128
_mm256_undefined_pd, _mm256_undefined_ps + _mm256_undefined_si256
_mm512_undefined, _mm512_undefined_ps, _mm512_undefined_pd + _mm512_undefined_epi32
Added builtin intrinsicss:
__builtin_ia32_undef128, __builtin_ia32_undef256 + __builtin_ia32_undef512
Differential Revision: http://reviews.llvm.org/D12052
llvm-svn: 246083
|
|
|
|
| |
llvm-svn: 230795
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The definition for _mm256_insert_epi64 was taking an int, which would get
truncated before being inserted in the vector.
Original patch by Joshua Magee!
Reviewers: bruno, craig.topper
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D7179
llvm-svn: 229811
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.
LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600
Reviewers: eli.friedman, craig.topper, rafael
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3601
llvm-svn: 208664
|
|
|
|
|
|
|
|
|
| |
Now, all extract & insert intrinsics should have the correct and operation
to ignore higher bits.
rdar://15250497
llvm-svn: 193267
|
|
|
|
|
|
| |
properly.
llvm-svn: 161319
|
|
|
|
| |
llvm-svn: 148944
|
|
|
|
| |
llvm-svn: 148925
|
|
change was made for 128-bit versions a while back.
llvm-svn: 148919
|