summaryrefslogtreecommitdiffstats
path: root/libclc/generic/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* math: Implement lgamma_rAaron Watry2016-09-153-0/+512
| | | | | | | | | Ported from the amd-builtins branch, which is itself based on the Sun Microsystems implementation. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281564
* Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZEAaron Watry2016-09-151-12/+27
| | | | | | | | | | | This macro is currently unused, but I plan to use it shortly. The previous form did casts of pointers without an address space, which doesn't work so well for CL 1.x. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281563
* Replace nextafter implementationMatt Arsenault2016-09-081-28/+24
| | | | | | This one passes conformance. llvm-svn: 280961
* Avoid ambiguity in calling atom_add functions.Jan Vesely2016-09-074-4/+4
| | | | | | | | | | clang (since r280553) allows pointer casts in function overloads, so we need to disambiguate the second argument. clang might be smarter about overloads in the future see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway. llvm-svn: 280871
* Implement vstore_half{,n}Jan Vesely2016-08-172-0/+42
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 278962
* Make min follow the OCL 1.0 specsJan Vesely2016-07-251-2/+2
| | | | | | | | | | | | | OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y are infinite or NaN, the return values are undefined." OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y are infinite or NaN, the return values are undefined." The 1.0 version is stricter so use that one. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276704
* Implement cbrt builtinTom Stellard2016-07-224-0/+821
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276497
* Implement cosh builtinTom Stellard2016-07-224-0/+322
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276496
* AMDGPU: Implement get_global_offset builtinJan Vesely2016-07-221-1/+1
| | | | | | | | | | | | | Also fix get_global_id to consider offset No idea how to add this for ptx, so they are stuck with the old get_global_id implementation. v2: split to a separate patch v3: Switch R600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276443
* 64 bit integers are legal in full profile without an extensionJan Vesely2016-06-171-5/+12
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 273042
* math: Use single precision fmax in sp pathJan Vesely2016-05-171-1/+1
| | | | | | | | | | Fixes fdim piglit on Turks v2: use CL fmax instead of __builtin Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom.stellard@amd.com> llvm-svn: 269807
* math: Add erf ported from amd-builtinsJan Vesely2016-05-062-0/+403
| | | | | | | | | | | | The scalar float/double function bodies are a direct copy/paste, aside from the removed (optional) code in float function body that requires subnormals. reviewers: jvesely Patch by: Vedran Miletić <rivanvx@gmail.com> llvm-svn: 268766
* math: Add fdim implementationAaron Watry2016-05-063-0/+82
| | | | | | | | | | | Based on the amd-builtin, but explicitly vectorized for all sizes (not just float4), and includes a vectorized double implementation. Passes piglit (float) tests on pitcairn. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 268708
* math: Fix ilogb(double) return typeAaron Watry2016-02-241-1/+1
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 261714
* math: Add ilogb ported from amd-builtinsAaron Watry2016-02-232-0/+58
| | | | | | | | | | | | | | | | The scalar float/double function bodies are a direct copy/paste with usage of the CLC wrappers to vectorize them. This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are equal to the results of ilogb(0.0f) and ilogb(float nan) respectively. v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 261639
* math: Fix log2 vectorization on non-fp64 hwJan Vesely2016-02-091-0/+2
| | | | | | reviewer: tstellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260301
* math: Add frexp ported from amd-builtinsAaron Watry2016-02-083-0/+121
| | | | | | | | | | | | | | | | | | The float implementation is almost a direct port from the amd-builtins, but instead of just having a scalar and float4 implementation, it has a scalar and arbitrary width vector implementation. The double scalar is also a direct port from AMD's builtin release. The double vector implementation copies the logic in the float vector implementation using the values from the double scalar version. Both have been tested in piglit using tests sent to that project's mailing list. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260114
* Implement modf math builtinTom Stellard2016-01-273-0/+70
| | | | | | | | V2: use the reference implementation as suggested by Matt Arsenault Patch By: Pavel Ondračka llvm-svn: 258933
* Add _CLC_V_V_VP_VECTORIZE macroTom Stellard2016-01-271-0/+22
| | | | | | Patch by: Pavel Ondračka llvm-svn: 258932
* Implement tanh builtinNiels Ole Salscheider2015-09-292-0/+147
| | | | | | This is a port from the AMD builtin library. llvm-svn: 248780
* Add image attribute getter builtinsTom Stellard2015-09-212-0/+10
| | | | | | | | | Added get_image_* OpenCL builtins to the headers. Added implementation to the r600 target. Patch by: Zoltan Gilian llvm-svn: 248159
* Remove files accidentally not removed in r244310Jeroen Ketema2015-08-132-9/+0
| | | | llvm-svn: 244987
* Fix double implementation of logTom Stellard2015-07-242-0/+27
| | | | | | We need to use M_LOG2E instead of M_LOG2E_F. llvm-svn: 243132
* Implement accurate log2 functionTom Stellard2015-07-245-0/+470
| | | | | | | | | Use the implementation was ported from the AMD builtin library rather than LLVM Intrinsics. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 243131
* Use llvm intrinsics for native_log and native_log2Tom Stellard2015-07-245-0/+116
| | | | llvm-svn: 243130
* Fix implementation of sqrt v2Tom Stellard2015-07-104-0/+112
| | | | | | | | | | | Passing values less than 0 to the llvm.sqrt() intrinsic results in undefined behavior, so we need to check the input and return NaN if is is less than 0. v2: - Fix build failures. llvm-svn: 241906
* Use a more accurate implementation for expTom Stellard2015-05-132-13/+85
| | | | | | | | | | | | Using exp2(x * M_LOG2E_F) does not give us accurate enough results for OpenCL. If you look at the new exp implementation you'll see that it does multiply the input by M_LOG2E_F, but it still uses the original input in part of the calculation. This exp implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237229
* Implement exp2 using OpenCL C rather than using an intrinsicTom Stellard2015-05-136-1/+257
| | | | | | | | | | Not all targets support the intrinsic, so it's better to have a generic implementation which does not use it. This exp2 implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237228
* Implement sin for double typesTom Stellard2015-05-121-7/+16
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237155
* Implement cos for double typesTom Stellard2015-05-125-7/+289
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237154
* Implement atan2pi builtinTom Stellard2015-05-122-0/+222
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237138
* Implement atan2 for doublesTom Stellard2015-05-123-2/+412
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237131
* math: limit half_sqrt to single precisionJan Vesely2015-05-091-4/+2
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236941
* geometric: Limit fast_{distance,length} functions to single precisionJan Vesely2015-05-092-25/+2
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236940
* Fix ldexp fp64 build errorJan Vesely2015-05-091-1/+1
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 236939
* Implement fast_normalize builtin v4Tom Stellard2015-05-093-0/+64
| | | | | | | | | | | | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove f suffix from constant in double implementations. - Consolidate implementations using the .cl/.inc approach. v3: - Use __CLC_FPSIZE instead of __CLC_FP{32,64} v4 (Jan Vesely): - Limit to single precision. llvm-svn: 236920
* Implement half_rsqrt builtin v3Tom Stellard2015-05-083-0/+54
| | | | | | | | | | | | | This is a generic implementation which just calls rsqrt. Targets should override this if they want a faster implementation. v2: - Alphabettize SOURCES v3 (Jan Vesely): Limit to single precision types. llvm-svn: 236915
* Move ldexp soft implementation to a separate fileJan Vesely2015-05-063-100/+132
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236648
* Implement sinpi builtinJan Vesely2015-05-062-0/+132
| | | | | | | | Ported from AMD builtin library, passes piglit on Turks. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236647
* math: Add ldexp implementationTom Stellard2015-05-064-0/+172
| | | | | | | | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Tom Stellard: - Add denormal handling. - Share vectorization code with r600 implementation. Patch By: Aaron Watry llvm-svn: 236639
* Fix implementation of normalize builtinTom Stellard2015-05-062-6/+152
| | | | | | | The new implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 236608
* Allow compilation depending to the LLVM versionTom Stellard2015-04-292-0/+9
| | | | | | | | | It allows to keep temporary compatibilty with older version. For exemple, this can be use when change are not to large. Patch by: EdB llvm-svn: 236113
* Fix compilation warnings without cl_khr_fp64Jan Vesely2015-04-243-6/+32
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 235762
* Implement fract builtinTom Stellard2015-04-233-0/+80
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 235620
* configure: Add --enable-runtime-subnormal optionTom Stellard2015-04-205-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes it possible for runtime implementations to disable subnormal handling at runtime. When this flag is enabled, decisions about how to handle subnormals in the library will be controlled by an external variable called __CLC_SUBNORMAL_DISABLE. Function implementations should use these new helpers for querying subnormal support: __clc_fp16_subnormals_supported(); __clc_fp32_subnormals_supported(); __clc_fp64_subnormals_supported(); In order for the library to link correctly with this feature, users will be required to either: 1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs). 2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the linker. These files are distributed with liblclc and installed to $(installdir). e.g.: llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc or llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc If you do not supply the --enable-runtime-subnormal then the library behaves the same as it did before this commit. In addition to these changes, the patch adds helper functions that should be used when implementing library functions that need special handling for denormals: __clc_fp16_subnormals_supported(); __clc_fp32_subnormals_supported(); __clc_fp64_subnormals_supported(); llvm-svn: 235329
* Implement atanh builtinTom Stellard2015-04-072-0/+114
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 234324
* Implement acosh builtinTom Stellard2015-04-072-0/+128
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 234323
* Implement atanpi builtinTom Stellard2015-04-022-0/+183
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233928
* Implement asinpi builtinTom Stellard2015-04-022-0/+171
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233927
* Implement asinh builtinTom Stellard2015-04-024-0/+418
| | | | | | | This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233926
OpenPOWER on IntegriCloud