summaryrefslogtreecommitdiffstats
path: root/libclc/generic/include/clc
Commit message (Collapse)AuthorAgeFilesLines
...
* acosh: Use unary_decl instead of custom inc fileJan Vesely2017-11-022-24/+6
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317233
* acos: Use unary_decl instead of custom inc fileJan Vesely2017-11-022-2/+6
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317232
* math: Implement native_log10Jan Vesely2017-10-253-0/+7
| | | | | | | | | | | | Use llvm instrinsic by default Provide amdgpu workaround v2: drop old amd copyrights Reviewer: Aaron Watry Reviewed-by: Vedran Miletić <vedran@miletic.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316588
* shared: Implement aligned vector stores (vstorea_half)Jan Vesely2017-10-221-13/+28
| | | | | | | | | | Float version passes newly posted piglit tests on turks, float and double pass on carrizo. v2: scalar vstorea_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316291
* shared: Implement aligned vector loads (vloada_half)Jan Vesely2017-10-221-17/+23
| | | | | | | | | | Passes newly posted piglits on turks and carrizo v2: add scalar vloada_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316290
* Do not include clc_nextafter header globallyJan Vesely2017-10-083-16/+2
| | | | | | | | Drop unused clc/math/clc_nextafter.h header Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315190
* math/nextafter: Use custom declaration inc fileJan Vesely2017-10-082-4/+2
| | | | | | Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315189
* math/binary_decl.inc: Do not declare mixed float/double functionsJan Vesely2017-10-081-5/+1
| | | | | | | | fmin/fmax only need vector/scalar mix Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315188
* Let get_work_dim take exactly 0 argumentsJeroen Ketema2017-10-011-1/+1
| | | | | Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314634
* Do no circularly define NULLJeroen Ketema2017-10-011-1/+1
| | | | | Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314633
* geometric: geometric functions are only supported for vector lengths <=4Jan Vesely2017-09-291-16/+0
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314545
* Implement cl_khr_int64_extended_atomics builtinsJan Vesely2017-09-206-0/+29
| | | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313811
* Implement cl_khr_int64_base_atomics builtinsJan Vesely2017-09-207-0/+34
| | | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313810
* Add native_recip(x) as ((1)/(x))Aaron Watry2017-09-132-0/+2
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Acked-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 313107
* integer: Add popcount implementation using ctpop intrinsicAaron Watry2017-09-093-0/+27
| | | | | | | | | | | | | Also copy/modify the unary_intrin.inc from math/ to make the intrinsic declaration somewhat reusable. Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2 Tested-by on GCN 1.0 (Pitcairn) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312854
* Implement vload_half{,n} and vload(half)Jan Vesely2017-09-081-20/+35
| | | | | | | | | | v2: add vload(half) as well make helpers amdgpu specific (NVPTX uses different private AS numbering) use clang builtin on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312839
* vstore: Cleanup and add vstore(half)Jan Vesely2017-09-081-1/+10
| | | | | | | | | | Add missing undefs Make helpers amdgpu specific (NVPTX uses different numbering for private AS) Use clang builtins on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312838
* Fixup clc.h commentJan Vesely2017-09-041-2/+1
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312491
* relational: Implement shuffle2 builtinAaron Watry2017-09-022-0/+48
| | | | | | | | | | | | | | This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in_dual_input v2: Add half support to shuffle2 Move shuffle2 to misc/ Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312404
* relational: Implement shuffle builtinAaron Watry2017-09-022-0/+50
| | | | | | | | | | | | | | This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in v2: Add half-precision support to shuffle when available. Move to misc/ and add section 6.12.12 to clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312403
* Add halfN types and enable fp16 when generating builtin declarationsAaron Watry2017-09-022-0/+12
| | | | | | | | | Uses the same mechanism to enable fp16 as we use for fp64 when processing clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312402
* amdgcn: Implement {read_,write_,}mem_fence builtinJan Vesely2017-08-162-0/+6
| | | | | | | | | v2: add more detailed comment about waitcnt instruction Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 311021
* add __kernel_exec macrosJan Vesely2017-07-284-11/+19
| | | | | | | | also consolidate macros into one file, and rename to clcmacros.h Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 309358
* generic: add missing get_work_dim includeJan Vesely2017-06-021-0/+1
| | | | | | | | Fixes few piglits since clang r304193 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 304556
* math: Implement sinh functionJan Vesely2017-02-253-0/+48
| | | | | | mostly copied form amd_builtins llvm-svn: 296233
* math: Add native_tan as wrapper to tanAaron Watry2017-02-232-0/+11
| | | | | | | | | | Trivially define native_tan as a redirect to tan. If there are any targets with a native implementation, we can deal with it later. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <arsenm2@gmail.com> llvm-svn: 295920
* Add the correct prefixes to the cl_khr_fp64 pragmaJeroen Ketema2017-02-121-1/+1
| | | | llvm-svn: 294915
* math: Add native_rsqrt builtin functionMatt Arsenault2017-02-092-0/+2
| | | | | | | | Trivial define to rsqrt. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 294608
* math: Add logb builtinAaron Watry2017-01-183-0/+4
| | | | | | | | | Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292335
* math: Add expm1 builtin functionAaron Watry2017-01-182-0/+10
| | | | | | | | | Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292334
* math: Implement tgammaAaron Watry2016-09-153-0/+5
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281566
* math: Implement lgammaAaron Watry2016-09-153-0/+4
| | | | | | | | Just use lgamma_r and ignore the value returned in the second argument Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281565
* math: Implement lgamma_rAaron Watry2016-09-153-0/+6
| | | | | | | | | Ported from the amd-builtins branch, which is itself based on the Sun Microsystems implementation. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281564
* Implement vstore_half{,n}Jan Vesely2016-08-171-19/+26
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 278962
* Implement cbrt builtinTom Stellard2016-07-223-0/+48
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276497
* Implement cosh builtinTom Stellard2016-07-223-0/+48
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276496
* geometric/floatn.inc: Add vec8 and vec16 typesTom Stellard2016-07-221-0/+16
| | | | llvm-svn: 276495
* AMDGPU: Implement get_global_offset builtinJan Vesely2016-07-222-0/+2
| | | | | | | | | | | | | Also fix get_global_id to consider offset No idea how to add this for ptx, so they are stuck with the old get_global_id implementation. v2: split to a separate patch v3: Switch R600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276443
* math: Add erf ported from amd-builtinsJan Vesely2016-05-062-0/+10
| | | | | | | | | | | | The scalar float/double function bodies are a direct copy/paste, aside from the removed (optional) code in float function body that requires subnormals. reviewers: jvesely Patch by: Vedran Miletić <rivanvx@gmail.com> llvm-svn: 268766
* math: Add fdim implementationAaron Watry2016-05-063-0/+4
| | | | | | | | | | | Based on the amd-builtin, but explicitly vectorized for all sizes (not just float4), and includes a vectorized double implementation. Passes piglit (float) tests on pitcairn. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 268708
* math: Add ilogb ported from amd-builtinsAaron Watry2016-02-234-0/+10
| | | | | | | | | | | | | | | | The scalar float/double function bodies are a direct copy/paste with usage of the CLC wrappers to vectorize them. This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are equal to the results of ilogb(0.0f) and ilogb(float nan) respectively. v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 261639
* math: Add frexp ported from amd-builtinsAaron Watry2016-02-084-0/+30
| | | | | | | | | | | | | | | | | | The float implementation is almost a direct port from the amd-builtins, but instead of just having a scalar and float4 implementation, it has a scalar and arbitrary width vector implementation. The double scalar is also a direct port from AMD's builtin release. The double vector implementation copies the logic in the float vector implementation using the values from the double scalar version. Both have been tested in piglit using tests sent to that project's mailing list. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260114
* Implement modf math builtinTom Stellard2016-01-273-0/+50
| | | | | | | | V2: use the reference implementation as suggested by Matt Arsenault Patch By: Pavel Ondračka llvm-svn: 258933
* integer: remove explicit casts from _MIN definitionsAaron Watry2015-10-061-3/+3
| | | | | | | | | | | | | | | The spec says (section 6.12.3, CL version 1.2): The macro names given in the following list must use the values specified. The values shall all be constant expressions suitable for use in #if preprocessing directives. This commit addresses the second part of that statement. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk> CC: Serge Martin <edb+libclc@sigluy.net> llvm-svn: 249445
* Implement tanh builtinNiels Ole Salscheider2015-09-293-0/+48
| | | | | | This is a port from the AMD builtin library. llvm-svn: 248780
* Add sampler defines.Tom Stellard2015-09-211-0/+18
| | | | | | Patch by: Zoltan Gilian llvm-svn: 248163
* Add image attribute defines.Tom Stellard2015-09-212-0/+32
| | | | | | Patch by: Zoltan Gilian llvm-svn: 248162
* r600: Add image writing builtins.Tom Stellard2015-09-211-0/+7
| | | | | | Patch by: Zoltan Gilian llvm-svn: 248161
* r600: Add image reading builtins.Tom Stellard2015-09-211-0/+13
| | | | | | Patch by: Zoltan Gilian llvm-svn: 248160
* Add image attribute getter builtinsTom Stellard2015-09-212-0/+20
| | | | | | | | | Added get_image_* OpenCL builtins to the headers. Added implementation to the r600 target. Patch by: Zoltan Gilian llvm-svn: 248159
OpenPOWER on IntegriCloud