summaryrefslogtreecommitdiffstats
path: root/libclc/generic/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Implement atan2 builtinTom Stellard2014-07-232-0/+82
| | | | llvm-svn: 213760
* Implement atan builtinTom Stellard2014-07-233-0/+248
| | | | llvm-svn: 213759
* relational: Implement isnotequalAaron Watry2014-07-172-0/+24
| | | | | | | | v2: Use relational macros instead of hand-rolled ones Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213320
* relational: Implement isgreaterequalAaron Watry2014-07-172-0/+23
| | | | | | | | v2: Use relational macros instead of hand-rolled macros Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213319
* relational: Implement isgreaterAaron Watry2014-07-172-0/+23
| | | | | | | | v2: Use relational macros instead of hand-rolled macros Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213318
* relational/signbit: Refactor to use relational macrosAaron Watry2014-07-171-70/+2
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213317
* Fix isnan definition for vector resultsAaron Watry2014-07-171-3/+3
| | | | | | | | | Vector true is -1, not 1, which means we need to use the relational unary macro instead of the normal unary builtin one. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213316
* relational: create re-usable macros for relational declarationsAaron Watry2014-07-171-0/+117
| | | | | | | | | | | | | | | | | | relational.h includes relational macros for defining functions which need to return 1 for scalar true and -1 for vector true. I believe that this is the only place that this behavior is required, so the macro is placed at its lowest useful level (same directory as it is used in). This also creates re-usable unary/binary declaration and floatn includes which should simplify relational builtin declarations. Mostly patterned off of include/math/[binary_decl|unary_decl|floatn].inc but with required changes for relational functions. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213315
* relational: Fix signbitAaron Watry2014-06-251-7/+7
| | | | | | | | | The vector components were mistakenly using () instead of {}, which caused all but the last vector component to be dropped on the floor. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk> llvm-svn: 211733
* relational: Implement signbitAaron Watry2014-06-252-0/+88
| | | | | | | | | | | v2 Changes: - use __builtin_signbit instead of shifting by hand - significantly improve vector shuffling - Works correctly now for signbit(float16) on radeonsi Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 211696
* Add exp10Jeroen Ketema2014-06-253-0/+19
| | | | | Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211680
* Move clcmacro.h to avoid cluttering user namespace v2Jeroen Ketema2014-06-249-0/+62
| | | | | | | | | v2: - use quotes instead of <> - add include to r600/lib/math/nextafter.c changed Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 211576
* Protect functions taking double by #ifdef cl_khr_fp64Jeroen Ketema2014-06-231-2/+6
| | | | | | | Also change the order of the functions to be consistent with the order in the header files. llvm-svn: 211496
* Add pownJeroen Ketema2014-06-182-0/+10
| | | | | Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211211
* Fix definition of INFINITY and add NAN/HUGE_VAL[F]Aaron Watry2014-06-161-2/+0
| | | | | | | | | | | | v3: change __builtin_nanf() to __builtin_nanf("") This doesn't work yet, but it was agreed to commit as-is with the logic that "broken" is better than "completely missing" and this should be fixed in clang. v2: use __builtin_inff() and also add nan/huge_val definitions Signed-off-by: Aaron Watry <awatry@gmail.com> llvm-svn: 211065
* math: Implement mix builtinAaron Watry2014-06-163-0/+18
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211047
* relational: Add isequal(floatN) builtinAaron Watry2014-06-162-0/+31
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211046
* Add all(igentype) builtinAaron Watry2014-06-162-0/+30
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211045
* Add files forgotten in the previous commitJeroen Ketema2014-06-132-0/+18
| | | | llvm-svn: 210896
* Implementations for exp(float) and exp(double) v2Jeroen Ketema2014-06-131-0/+1
| | | | | | | | | | | | Use separate implementations instead of a macro to ensure the constant multiplied with is of higher precision. v2: Use the correct formula, spotted by Dan Liew <daniel.liew@imperial.ac.uk> Reviewed-by: Aaron Warty <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 210891
* Add sincosTom Stellard2014-03-213-0/+20
| | | | | | | Patch by: Jeroen Ketema Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 204478
* Add cross for double3 and double4Tom Stellard2014-03-211-0/+14
| | | | | | | Patch by: Jeroen Ketema Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 204477
* Implement builtins for cl_khr_global_int32_base_atomics extensionTom Stellard2013-11-185-0/+40
| | | | llvm-svn: 195021
* s/_CLC_DECL/_CLC_DEF/Tom Stellard2013-10-312-14/+14
| | | | | | | | | Some function definitions were using _CLC_DECL, which meant that they weren't being marked as always_inline. Reviewed-by and Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 193754
* Port pocl's gen_convert.py script to libclcTom Stellard2013-10-102-149/+390
| | | | | | | This script generates implementations for the entire set of convert_* functions, llvm-svn: 192385
* Implement sign() builtinTom Stellard2013-10-102-0/+28
| | | | llvm-svn: 192384
* Implement nextafter() builtinTom Stellard2013-10-103-0/+55
| | | | | | | | | | | | | | There are two implementations of nextafter(): 1. Using clang's __builtin_nextafter. Clang replaces this builtin with a call to nextafter which is part of libm. Therefore, this implementation will only work for targets with an implementation of libm (e.g. most CPU targets). 2. The other implementation is written in OpenCL C. This function is known internally as __clc_nextafter and can be used by targets that don't have access to libm. llvm-svn: 192383
* Implement isnan() builtinTom Stellard2013-10-102-0/+18
| | | | llvm-svn: 192382
* Add atomic_sub and atomic_dec builtin functionsAaron Watry2013-09-061-0/+12
| | | | | Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 190201
* Add atomic_inc and atomic_add builtinsAaron Watry2013-09-052-0/+12
| | | | | Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 190058
* Add mul_hi implementation [v2]Aaron Watry2013-08-192-0/+110
| | | | | | | | | | | | | | | Everything except long/ulong is handled by just casting to the next larger type, doing the math and then shifting/casting the result. For 64-bit types, we break the high/low parts of each operand apart, and do a FOIL-based multiplication. v2: Discard the stack-overflow implementation due to copyright concerns. - The implementation is still FOIL-based, but discards the previous code. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188684
* Add rhadd builtinAaron Watry2013-08-153-0/+11
| | | | | | | | | | | | rhadd = (x+y+1)>>1 Implemented as: (x>>1) + (y>>1) + ((x&1)|(y&1)) This prevents us having to do assembly addition and overflow detection Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188477
* Add hadd builtinAaron Watry2013-08-153-0/+11
| | | | | | | | | | (x + y) >> 1 gets changed to: (x>>1) + (y>>1) + (x&y&1) Saves us having to do any llvm assembly and overflow checking in the addition. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188476
* Add intN vloadN() implementations for address spaces 3 and 4Aaron Watry2013-08-121-0/+60
| | | | | | | | Not hooked up to R600 yet due to current lack of support, at least on EG. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188181
* Add vload* for addrspace(2) and use as constant load for R600Aaron Watry2013-08-122-2/+34
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188179
* Add some missing convert_* functionsTom Stellard2013-08-101-11/+38
| | | | llvm-svn: 188131
* Implement generic upsample()Aaron Watry2013-07-192-0/+35
| | | | | | | | | | | | | | Reduces all vector upsamples down to its scalar components, so probably not the most efficient thing in the world, but it does what the spec says it needs to do. Another possible implementation would be to convert/cast everything as unsigned if necessary, upsample the input vectors, create the upsampled value, and then cast back to signed if required. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard at amd.com> llvm-svn: 186691
* Fix and re-enable R600 vload/vstore assemblyAaron Watry2013-07-162-56/+35
| | | | | | | | | | | | | | | | | | | The assembly optimizations were making unsafe assumptions about which address spaces had which identifiers. Also, fix vload/vstore with 64-bit pointers. This was broken previously on Radeon SI. This version still only has assembly versions of int/uint 2/4/8/16 for global loads and stores on R600, but it does it in a way that would be very easily extended to private/local/constant and could also be handled easily on other architectures. v2: 1) Leave v[load|store]_impl.ll in generic/lib 2) Remove vload_if.ll and vstore_if.ll interfaces 3) Fix address+offset calculations 3) Remove offset from assembly arg list llvm-svn: 186416
* libclc: vload/vstore disable assembly and fix offset calculationAaron Watry2013-07-165-245/+20
| | | | | | | | | | | This commit gets us back to pure CLC and fixes offset calculations. The next commit will re-enable the assembly implementation for R600, fix bugs related to 64-bit address spaces, and also fix the incorrect assumption that address space identifiers are the same in all architectures. llvm-svn: 186415
* Implement mad24() and mul24() builtinsTom Stellard2013-07-085-0/+24
| | | | | Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 185839
* Add __CLC_ prefix to all macro definitions in headersTom Stellard2013-07-0825-68/+68
| | | | | | | | | | | libclc was defining and undefing GENTYPE and several other macros with common names in its header files. This was preventing applications from defining macros with identical names as command line arguments to the compiler, because the definitions in the header files were masking the macros defined as compiler arguements. Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 185838
* libclc: Add assembly versions of vstore for global [u]int4/8/16Tom Stellard2013-06-264-6/+168
| | | | | | | | | | | | | The assembly should be generic, but at least currently R600 only supports 32-bit stores of [u]int1/4, and I believe that only global is well-supported. R600 lowers the 8/16 component stores to multiple 4-component stores. The unoptimized C versions of the other stuff is left in place. Patch by: Aaron Watry llvm-svn: 185009
* libclc: Add assembly versions of vload for global int4/8/16Tom Stellard2013-06-264-2/+162
| | | | | | | | | | | | | The assembly should be generic, but at least currently R600 only supports 32-bit loads of int1/4, and I believe that only global is well-supported. R600 lowers the 8/16 component vectors to multiple 4-bit loads. The unoptimized C versions of the other stuff is left in place. Patch by: Aaron Watry llvm-svn: 185008
* libclc: Initial vstore implementationTom Stellard2013-06-262-0/+57
| | | | | | | | | | Assumes that the target supports byte-addressable stores. Completely unoptimized. Patch by: Aaron Watry llvm-svn: 185007
* libclc: Initial vload implementationTom Stellard2013-06-262-0/+48
| | | | | | | | Should work for all targets and data types. Completely unoptimized. Patch by: Aaron Watry llvm-svn: 185006
* libclc: Implement clz() builtinTom Stellard2013-06-264-0/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Squashed commit of the following: commit a0df0a0e86c55c1bdc0b9c0f5a739e5adef4b056 Author: Aaron Watry <awatry@gmail.com> Date: Mon Apr 15 18:42:04 2013 -0500 libclc: Rename clz.ll to clz_if.ll to ensure it gets built. configure.py treats files that have the same name with the .cl and .ll extensions as overriding eachother. E.g. If you have clz.cl and clz.ll both specified to be built in the same SOURCES file, only the first file listed will actually be built. Since the contents of clz.ll were an interface that is implemented in clz_impl.ll, rename clz.ll to clz_if.ll to make sure that the interface is built. commit 931b62bed05c58f737de625bd415af09571a6a5a Author: Aaron Watry <awatry@gmail.com> Date: Sat Apr 13 12:32:54 2013 -0500 libclc: llvm assembly implementation of clz Untested... currently crashes in the same manner as add_sat. commit 6ef0b7b0b6d2e5584086b4b9a9243743b2e0538f Author: Aaron Watry <awatry@gmail.com> Date: Sat Mar 23 12:35:27 2013 -0500 libclc: Add stub clz builtin For scalar int/uint, attempt to use the clz llvm builtin.. for all others return 0 until an actual implementation is finished. Patch by: Aaron Watry llvm-svn: 185004
* libclc: Add clamp(vec, scalar, scalar) and max(vec, scalar)Tom Stellard2013-06-262-0/+12
| | | | | | | | | | | For any GENTYPE that isn't scalar, we need to implement a mixed vector/scalar version of clamp/max. This depends on the min() patches I sent to the list a few minutes ago. Patch by: Aaron Watry llvm-svn: 185003
* libclc: Implement the min(vec, scalar) version of the min builtin.Tom Stellard2013-06-261-0/+6
| | | | | | | | | | Checks if the current GENTYPE is scalar, and if not, then defines a separate implementation of the function which casts the second arg to vector before proceeding. Patch by: Aaron Watry llvm-svn: 185002
* libclc: implement initial version of min()Tom Stellard2013-06-263-0/+15
| | | | | | | | This doesn't handle the integer cases for min(vector, scalar). Patch by: Aaron Watry llvm-svn: 185001
* libclc: Rename [add|sub]_sat.ll to [add|sub]_sat_if.llTom Stellard2013-06-263-2/+2
| | | | | | | | | | | | | | | | | configure.py allows overloading *.cl with *.ll, but will only ever build the first file listed in SOURCES of ${file}.cl and ${file}.ll add_sat, sub_sat, (and the soon to be submitted clz) all define interfaces in ${function_name}.ll which are implemented in ${function_name}_impl.ll. Renaming the interface files is enough to get them to build again, fixing CL usage of these functions. Tested on clover/r600g. Patch by: Aaron Watry llvm-svn: 185000
OpenPOWER on IntegriCloud