| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml.
Differential Revision: https://reviews.llvm.org/D61675
llvm-svn: 374782
|
|
|
|
|
|
|
| |
This reverts commit r374240. It broke OCaml tests:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014
llvm-svn: 374354
|
|
|
|
|
|
|
|
| |
Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus.
Differential Revision: https://reviews.llvm.org/D61675
llvm-svn: 374240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the following tests to run under the new pass manager only:
```
Clang :: CodeGen/avx512-reduceMinMaxIntrin.c (1 of 4)
Clang :: CodeGen/avx512vl-builtins.c (2 of 4)
Clang :: CodeGen/avx512vlbw-builtins.c (3 of 4)
Clang :: CodeGen/avx512f-builtins.c (4 of 4)
```
The new PM added extra bitcasts that weren't checked before. For
reduceMinMaxIntrin.c, the issue was mostly the alloca's being in a different
order. Other changes involved extra bitcasts, and differently ordered loads and
stores, but the logic should still be the same.
Differential revision: https://reviews.llvm.org/D65110
llvm-svn: 367157
|
|
|
|
|
|
|
|
|
|
| |
_MM_FROUND_CUR_DIRECTION is the behavior of the intrinsics that
don't take a rounding mode argument. So a better test
is using _MM_FROUND_NO_EXC with the SAE only intrinsics and
an explicit rounding mode with the intrinsics that support
embedded rounding mode.
llvm-svn: 364127
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mm256_maskz_cvtps_ph aliases for their corresponding cvt_roundps_ph intrinsic.
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
|
|
|
|
|
|
|
|
| |
for the mask argument.
Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks.
llvm-svn: 352344
|
|
|
|
|
|
| |
scalar integer to vXi1 for the mask arguments to the intrinsics.
llvm-svn: 351408
|
|
|
|
|
|
|
|
|
|
| |
vXi1 vector instead of a scalar
We need to custom handle these so we can turn the scalar mask into a vXi1 vector.
Differential Revision: https://reviews.llvm.org/D56530
llvm-svn: 351390
|
|
|
|
|
|
| |
As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics.
llvm-svn: 350619
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics (clang)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.
LLVM counterpart: https://reviews.llvm.org/D55938
Differential Revision: https://reviews.llvm.org/D55937
llvm-svn: 349796
|
|
|
|
|
|
|
|
|
|
| |
This adds
_mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8
_mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16
_mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8
_mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16
llvm-svn: 344862
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds
_mm_and_epi32, _mm_and_epi64
_mm_andnot_epi32, _mm_andnot_epi64
_mm_or_epi32, _mm_or_epi64
_mm_xor_epi32, _mm_xor_epi64
_mm256_and_epi32, _mm256_and_epi64
_mm256_andnot_epi32, _mm256_andnot_epi64
_mm256_or_epi32, _mm256_or_epi64
_mm256_xor_epi32, _mm256_xor_epi64
_mm_loadu_epi32, _mm_loadu_epi64
_mm_load_epi32, _mm_load_epi64
_mm256_loadu_epi32, _mm256_loadu_epi64
_mm256_load_epi32, _mm256_load_epi64
_mm512_loadu_epi32, _mm512_loadu_epi64
_mm512_load_epi32, _mm512_load_epi64
_mm_storeu_epi32, _mm_storeu_epi64
_mm_store_epi32, _mm_load_epi64
_mm256_storeu_epi32, _mm256_storeu_epi64
_mm256_store_epi32, _mm256_load_epi64
_mm512_storeu_epi32, _mm512_storeu_epi64
_mm512_store_epi32,V _mm512_load_epi64
llvm-svn: 344861
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead
Differential Revision: https://reviews.llvm.org/D48712
llvm-svn: 336643
|
|
|
|
|
|
|
|
|
|
| |
fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask
Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.
llvm-svn: 336346
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.
Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8
llvm-svn: 336042
|
|
|
|
|
|
| |
instead.
llvm-svn: 336036
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.
Affected AVX512 builtins:
__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: craig.topper, spatel, efriedma
Differential Revision: https://reviews.llvm.org/D45616
llvm-svn: 335339
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k
Reviewed By: craig.topper
Subscribers: tkrupa, cfe-commits
Differential Revision: https://reviews.llvm.org/D41168
llvm-svn: 334850
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.
Reviewers: RKSimon, delena, spatel, GBuella
Reviewed By: RKSimon
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D47693
llvm-svn: 334366
|
|
|
|
|
|
| |
checking.
llvm-svn: 334311
|
|
|
|
|
|
| |
and immediate range checking.
llvm-svn: 334265
|
|
|
|
|
|
|
|
| |
checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
|
|
|
|
|
|
| |
checking.
llvm-svn: 334256
|
|
|
|
|
|
| |
target feature checking and immediate range checking.
llvm-svn: 334244
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
|
|
|
|
|
|
| |
to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction instead.
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.
To avoid that just generate IR directly in CGBuiltin.cpp.
Differential Revision: https://reviews.llvm.org/D47125
llvm-svn: 332891
|
|
|
|
|
|
|
|
|
|
|
|
| |
packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.
We already do something similar for scalar conversions.
Differential Revision: https://reviews.llvm.org/D46863
llvm-svn: 332882
|
|
|
|
|
|
|
|
| |
in IR instead.
Someday maybe we'll use selects for all the builtins.
llvm-svn: 332825
|
|
|
|
|
|
|
|
| |
the avx512 truncate builtins.
The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw.
llvm-svn: 332322
|
|
|
|
|
|
|
|
|
|
| |
builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.
Differential Revision: https://reviews.llvm.org/D46742
llvm-svn: 332266
|
|
|
|
|
|
| |
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.
llvm-svn: 330681
|
|
|
|
|
|
|
|
| |
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.
Differential Revision: https://reviews.llvm.org/D45421
llvm-svn: 329605
|
|
|
|
|
|
| |
sse/avx buiiltins and a select.
llvm-svn: 326039
|
|
|
|
|
|
| |
__builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead.
llvm-svn: 326022
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827
Reviewers: zvi, delena, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43143
llvm-svn: 324828
|
|
|
|
|
|
|
|
|
| |
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang
Differential Revision: https://reviews.llvm.org/D38737
llvm-svn: 318035
|
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38672
Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
|
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38231
Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57
llvm-svn: 314102
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D37668
llvm-svn: 313624
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D37694
llvm-svn: 313133
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand.
This is not the same as LLVM's undef value which can take on multiple bit patterns.
There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.
Differential Revision: https://reviews.llvm.org/D30834
llvm-svn: 297588
|
|
|
|
|
|
| |
Verified that the backend codegens this equally well.
llvm-svn: 292329
|
|
|
|
|
|
| |
select and the avx unmasked builtins.
llvm-svn: 289338
|
|
|
|
|
|
| |
Missed in rL287733
llvm-svn: 287755
|
|
|
|
| |
llvm-svn: 287733
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.
This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.
Differential Revision: https://reviews.llvm.org/D26686
llvm-svn: 287088
|