| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
_mm_extract_pi16 and _mm_insert_pi16.
__m64 is a vector of 1 long long. But the builtins these intrinsics
are calling expect a vector of 4 shorts.
Fixes PR44589
(cherry picked from commit 16b9410caa35da976fa5f3cf6dd3d6f3776d51ca)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.
If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.
Differential Revision: https://reviews.llvm.org/D73231
(cherry picked from commit 12fefeef203ab4ef52d19bcdbd4180608a4deae1)
|
|
|
|
|
|
| |
Wrong argument order resulted in broken shfl ops for 64-bit types.
(cherry picked from commit cc14de88da27a8178976972bdc8211c31f7ca9ae)
|
|
|
|
|
|
| |
long, since clang's <altivec.h> doesn't provide it yet!
(cherry picked from commit 388eaa1270c2762d61b756759b6db8cf15bd3a83)
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling `-Wcast-qual` identified many casts in various system headers
that were dropping the `const` qualifier. Fixing those missing
qualifiers pointed out that a few of the definitions of the builtins
did not properly identify their arguments as `const` pointers. This
commit fixes those builtin definitions, and the system header files
so that they no longer drop the qualifier.
Differential Revision: https://reviews.llvm.org/D71718
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch C2 as mentioned in RFC
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061834.html
This adds CMSE builtin functions, and introduces arm_cmse.h header which has
useful macros, functions, and data types for end-users of CMSE.
Patch by Javed Absar.
Diferential Revision: https://reviews.llvm.org/D70817
|
|
|
|
|
|
|
|
|
|
|
| |
version from immintrin.h
The forward declaration had a cdecl calling convention, but the
inline version did not. This leads to a conflict if the default
calling convention is not cdecl. Fix this by just removing the
forward declaration.
Fixes PR41503
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to use a 64-bit type in 64-bit mode so a 64-bit register
will get used in the generated assembly. I've also changed the
constraints to just use "r" intead of "q". "q" forces to a only
an a/b/c/d register in 32-bit mode, but I see no reason that
would matter here.
Fixes Nico's note in PR19301 over 4 years ago.
Differential Revision: https://reviews.llvm.org/D70101
|
|
|
|
|
|
|
|
| |
As we currently have it implemented in altivec.h, the offsets for these two
intrinsics are element offsets. The documentation in the ABI (as well as the
implementation in both XL and GCC) states that these should be byte offsets.
Differential revision: https://reviews.llvm.org/D63636
|
|
|
|
|
|
|
| |
We currently emit a double precision comparison instruction for this, whereas we
need to emit the single precision version.
Differential revision: https://reviews.llvm.org/D64024
|
|
|
|
|
|
| |
Make sure they don't both define __nop.
Differential Revision: https://reviews.llvm.org/D69012
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Writing support for three ACLE functions:
unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)
CLS stands for "Count number of leading sign bits".
In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adding support for ACLE intrinsics.
Patch by Michael Platings.
Reviewers: chill, t.p.northover, efriedma
Reviewed By: chill
Subscribers: kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D69297
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.
Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.
Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.
So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.
Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.
(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67161
|
|
|
|
|
|
|
|
|
|
|
|
| |
These intrinsics use llvm.cttz intrinsics so are always available
even without the bmi feature. We already don't check for the bmi
feature on the intrinsics themselves. But we were blocking the
include of the header file with _MSC_VER unless BMI was enabled
on the command line.
Fixes PR30506.
llvm-svn: 374516
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_castf64_u64, _castu32_f32, _castu64_f64
Summary:
Adding support for some missing intrinsics:
_castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64
Reviewers: craig.topper, LuoYuanke, RKSimon, pengfei
Reviewed By: RKSimon
Subscribers: llvm-commits
Patch by yubing (Bing Yu)
Differential Revision: https://reviews.llvm.org/D67212
llvm-svn: 372802
|
|
|
|
|
|
| |
corresponding tests.
llvm-svn: 372063
|
|
|
|
| |
llvm-svn: 372061
|
|
|
|
| |
llvm-svn: 371814
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is similar to vec_ct* in https://reviews.llvm.org/rL304205.
The argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.
The fix is to turn the function into macros in altivec.h.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43072
Reviewers: nemanjai, hfinkel, #powerpc, wuzish
Reviewed By: #powerpc, wuzish
Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D66699
llvm-svn: 370902
|
|
|
|
|
|
|
|
|
|
|
| |
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
llvm-svn: 370792
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mm512_stream_pd, _mm512_stream_si512
Reviewers: craig.topper, pengfei, LuoYuanke, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Patch by Bing Yu (yubing)
Differential Revision: https://reviews.llvm.org/D66786
llvm-svn: 370691
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adding support for some missing intrinsics:
_mm512_cvtsi512_si32
Reviewers: craig.topper, pengfei, LuoYuanke, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits
Patch by Bing Yu (yubing)
Differential Revision: https://reviews.llvm.org/D66785
llvm-svn: 370297
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D66512
llvm-svn: 369641
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In C++ mode we should only avoid adding __OPENCL_C_VERSION__,
all other predefined macros about the language mode are still
valid.
This change also fixes the language version check in the
headers accordingly.
Differential Revision: https://reviews.llvm.org/D65941
llvm-svn: 368552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The signature "Geode by NSC" for NSC vendor is wrong.
In lib/Headers/cpuid.h, signature_NSC_edx and signature_NSC_ecx constants are inverted (cpuid signature order is ebx # edx # ecx).
Reviewers: teemperor, rsmith, craig.topper
Reviewed By: teemperor, craig.topper
Subscribers: craig.topper, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D65978
llvm-svn: 368510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Port existing headers which include x86 intrinsics implementation to
PowerPC platform (using Altivec), along with tests. Also, tests about
including these intrinsic headers are combined.
The headers are mainly developed by Steven Munroe, with contributions
from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D65630
llvm-svn: 368392
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-commit r366322 after some fixes
TME is a future architecture technology, documented in
https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a
More about the future architectures:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture
This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".
It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)
Differential Revision: https://reviews.llvm.org/D64416
Patch by Javed Absar and Momchil Velikov
llvm-svn: 367428
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the platform check out of PPC Linux toolchain code and add platform guards
to the intrinsic headers, since they are supported currently only on 64-bit
PowerPC targets.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D64849
llvm-svn: 367281
|
|
|
|
| |
llvm-svn: 366699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?)
side-effect that the address space will be fixed (as generic in OpenCL
2.0 mode). The consequence is that any target specific address space
for the clk_event_t type will not be applied.
It is not clear why the void pointer cast was needed in the first
place, and it seems we can do without it.
Differential Revision: https://reviews.llvm.org/D63876
llvm-svn: 366546
|
|
|
|
|
|
|
|
|
|
|
| |
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.
Reviewed By: Hal Finkel
Differential Revision: https://reviews.llvm.org/D64850
llvm-svn: 366406
|
|
|
|
|
|
| |
This reverts r366322 (git commit 4b8da3a503e434ddbc08ecf66582475765f449bc)
llvm-svn: 366355
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TME is a future architecture technology, documented in
https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a
More about the future architectures:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture
This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".
It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)
Patch by Javed Absar and Momchil Velikov
Differential Revision: https://reviews.llvm.org/D64416
llvm-svn: 366322
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.
This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).
make check-all didn't show any new failures.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
Differential Revision: https://reviews.llvm.org/D64495
llvm-svn: 366197
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10303.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365933
|
|
|
|
|
|
|
|
| |
the store as a <2 x float> instead of i64.
This is similar to what we do for loadl_pi and loadh_pi.
llvm-svn: 365669
|
|
|
|
|
|
|
|
|
| |
We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros
in r363794.
Also put the `memory_order` typedef back inside a `>= CL2.0` guard.
llvm-svn: 364174
|
|
|
|
|
|
|
|
| |
Identified the duplicate declarations using
sort lib/Headers/opencl-c.h | uniq -c | grep ' 2'
llvm-svn: 364173
|
|
|
|
|
|
|
|
|
| |
Add overloads with generic address space pointer to old atomics.
This is currently only added for C++ compilation mode.
Differential Revision: https://reviews.llvm.org/D62335
llvm-svn: 364071
|
|
|
|
|
|
| |
Patch by Pierre Gondois.
llvm-svn: 364020
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mm256_maskz_cvtps_ph aliases for their corresponding cvt_roundps_ph intrinsic.
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.
Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF
Reviewed by: hubert.reinterpretcast, mclow.lists
Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits
Tags: #LLVM, #clang, #libc++
Differential Revision: https://reviews.llvm.org/D59253
llvm-svn: 363939
|
|
|
|
| |
llvm-svn: 363890
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the -fdeclare-opencl-builtins option will require a way to
predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`,
etc. Move these out of opencl-c.h into opencl-c-base.h such that the
latter can be shared by -fdeclare-opencl-builtins and
-finclude-default-header.
This changes the behaviour of -finclude-default-header when
-fdeclare-opencl-builtins is specified: instead of including the full
header, it will include the header with only the base definitions.
Differential revision: https://reviews.llvm.org/D63256
llvm-svn: 363794
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
It's a follow-up patch of D62121.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Differential Revision: https://reviews.llvm.org/D62569
llvm-svn: 363122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scalar version :
_mm_cvtsbh_ss , _mm_cvtness_sbh
Vector version:
_mm512_cvtpbh_ps , _mm256_cvtpbh_ps
_mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps
_mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D62363
llvm-svn: 363018
|
|
|
|
|
|
|
|
|
|
|
|
| |
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.
Patch by Tianqing Wang (tianqing)
Differential Revision: https://reviews.llvm.org/D62282
llvm-svn: 362685
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Remove unnecessary definition (otherwise the extension will be defined
where it's not supposed to be defined).
Consider the code:
#pragma OPENCL EXTENSION cl_intel_planar_yuv : begin
// some declarations
#pragma OPENCL EXTENSION cl_intel_planar_yuv : end
is enough for extension to become known for clang.
Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58666
llvm-svn: 362398
|