| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.
llvm-svn: 333944
|
|
|
|
|
|
|
|
| |
equivalents.
Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use.
llvm-svn: 333626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
|
|
|
|
|
|
| |
to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction instead.
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.
To avoid that just generate IR directly in CGBuiltin.cpp.
Differential Revision: https://reviews.llvm.org/D47125
llvm-svn: 332891
|
|
|
|
|
|
|
|
|
|
|
|
| |
packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.
We already do something similar for scalar conversions.
Differential Revision: https://reviews.llvm.org/D46863
llvm-svn: 332882
|
|
|
|
|
|
|
|
| |
in IR instead.
Someday maybe we'll use selects for all the builtins.
llvm-svn: 332825
|
|
|
|
|
|
|
|
| |
the avx512 truncate builtins.
The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw.
llvm-svn: 332322
|
|
|
|
|
|
|
|
|
|
| |
builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.
Differential Revision: https://reviews.llvm.org/D46742
llvm-svn: 332266
|
|
|
|
|
|
| |
_mm512_mask_cvtps_pd and _mm512_maskz_cvtps_pd.
llvm-svn: 332213
|
|
|
|
|
|
|
|
| |
If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit.
If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION.
llvm-svn: 332210
|
|
|
|
|
|
|
|
|
|
| |
_mm_cvtu64_ss.
We can use direct C code for these that will use uitofp and insertelement instructions.
For the versions that take an explicit rounding mode we can't do this.
llvm-svn: 332203
|
|
|
|
|
|
|
|
|
|
| |
not use a 512-bit intermediate vector.
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.
Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.
llvm-svn: 331958
|
|
|
|
|
|
|
|
|
|
| |
intrinsics to match icc.
On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld.
Fixes PR37140.
llvm-svn: 330923
|
|
|
|
|
|
| |
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.
llvm-svn: 330681
|
|
|
|
| |
llvm-svn: 329689
|
|
|
|
|
|
|
|
| |
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.
Differential Revision: https://reviews.llvm.org/D45421
llvm-svn: 329605
|
|
|
|
|
|
|
|
| |
The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior.
Fixes PR36360.
llvm-svn: 324954
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827
Reviewers: zvi, delena, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43143
llvm-svn: 324828
|
|
|
|
| |
llvm-svn: 324647
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.
I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.
Reviewers: RKSimon, spatel, zvi, jina.nahias
Reviewed By: RKSimon
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D42016
llvm-svn: 322461
|
|
|
|
|
|
|
| |
This makes the test more resilient and consistent with the other tests
introduced in r320919.
llvm-svn: 320971
|
|
|
|
| |
llvm-svn: 320919
|
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D39719
Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d
llvm-svn: 319777
|
|
|
|
|
|
|
|
|
| |
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang
Differential Revision: https://reviews.llvm.org/D38737
llvm-svn: 318035
|
|
|
|
|
|
|
|
|
|
| |
// CHECK: shufflevector <8 x double> %0, <8 x double> %{{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 8, i32 9>
To
// CHECK: shufflevector <8 x double> %{{.*}}, <8 x double> %{{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 8, i32 9>
for fixing 318025 commit warning
Change-Id: Id48a1fe1f247fe6a0b84e7189f18d2e637678e79
llvm-svn: 318031
|
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38672
Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
|
|
|
|
|
|
| |
Missed a line in r314158.
llvm-svn: 314159
|
|
|
|
|
|
| |
I'm not sure why yet, but there may be differences depending on the host?
llvm-svn: 314158
|
|
|
|
|
|
|
|
| |
Clang regression tests that depend on the optimizer can break
when there are changes to LLVM...as in:
https://reviews.llvm.org/rL314117
llvm-svn: 314144
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D37668
llvm-svn: 313624
|
|
|
|
|
|
|
|
|
|
|
|
| |
a backend isel failure.
The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable.
While there add the missing test case for this intrinsic for this for 64-bit mode.
This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that.
llvm-svn: 313392
|
|
|
|
|
|
|
|
| |
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D37694
llvm-svn: 313133
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR33977)
Based off the Intel Intrinsics guide, we should expect a void const* argument.
Prevents 'passing 'const void *' to parameter of type 'void *' discards qualifiers' warnings.
Differential Revision: https://reviews.llvm.org/D37449
llvm-svn: 312523
|
|
|
|
|
|
|
| |
This test would fail after the proposed change in:
https://reviews.llvm.org/D34242
llvm-svn: 306433
|
|
|
|
| |
llvm-svn: 301749
|
|
|
|
|
|
|
|
|
|
| |
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
LLVM companion patch: D31767.
Differential Revision: https://reviews.llvm.org/D31766
llvm-svn: 300326
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D31155
llvm-svn: 298364
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adding missing intrinsics :
_mm512_set_epi16,
_mm512_set_epi8,
_mm512_permutevar_epi32
_mm512_mask_permutevar_epi32
Reviewers: zvi, guyblank, eladcohen, craig.topper
Reviewed By: craig.topper
Subscribers: craig.topper, cfe-commits
Differential Revision: https://reviews.llvm.org/D31034
llvm-svn: 298208
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand.
This is not the same as LLVM's undef value which can take on multiple bit patterns.
There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.
Differential Revision: https://reviews.llvm.org/D30834
llvm-svn: 297588
|
|
|
|
|
|
| |
Verified that the backend codegens this equally well.
llvm-svn: 292329
|
|
|
|
|
|
| |
added unmasked versions and selects.
llvm-svn: 290580
|
|
|
|
|
|
|
|
| |
with the newly added unmasked versions and selects."
I failed to merge this with r290574.
llvm-svn: 290578
|
|
|
|
|
|
| |
added unmasked versions and selects.
llvm-svn: 290575
|
|
|
|
|
|
|
|
| |
versions without masking so wrap it with select.
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.
llvm-svn: 289351
|
|
|
|
| |
llvm-svn: 287733
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.
This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.
Differential Revision: https://reviews.llvm.org/D26686
llvm-svn: 287088
|
|
|
|
|
|
|
|
| |
unmasked builtins and a select.
This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking.
llvm-svn: 286757
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the mask argument of a vfmadd intrinsic.
Summary: Inverting the mask argument does not reflect the intended semantics of the intrinsic.
Reviewers: igorb, delena
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D26019
llvm-svn: 286733
|
|
|
|
|
|
|
|
| |
element builtins over to the newly added unmasked builtins and a select.
This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend.
llvm-svn: 286714
|