summaryrefslogtreecommitdiffstats
path: root/libclc/generic/lib/shared
Commit message (Collapse)AuthorAgeFilesLines
* Add initial support for half precision builtinsJan Vesely2018-05-171-1/+3
| | | | | | | | | | | | | | v2: fix fmax implementation use consistent checks for __CLC_FP_SIZE add missing TODOs fix whitespace in definitions.h v3: undef ZERO in modf.inc Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 332677
* Move cl_khr_fp64 exntension enablement to gentype include listsJan Vesely2018-03-063-12/+0
| | | | | | | | This will make adding cl_khr_fp16 support easier Reviewed-by: Aaron Watry <awatry@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 326816
* Add vstore_half_rte implementationJan Vesely2018-02-061-1/+44
| | | | | | | | Passes CTS on carrizo Reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 324376
* Add vstore_half_rtp implementationJan Vesely2018-02-061-1/+10
| | | | | | | | Passes CTS on carrizo Reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 324375
* Add vstore_half_rtn implementationJan Vesely2018-02-061-1/+41
| | | | | | | | Passes CTS on carrizo Reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 324374
* Add vstore_half_rtz implementationJan Vesely2018-02-061-1/+34
| | | | | | | | Passes CTS on carrizo Reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 324373
* vstore_half: Add support for custom rounding functionsJan Vesely2018-02-061-23/+39
| | | | | | | | | Add another layer of indirection This will be used for specific rounding modes Reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 324371
* vstore_half: Make sure the helper function is always inlineJan Vesely2018-02-061-1/+1
| | | | | | Reviewer: Jeroen Ketema <j.ketema@xs4all.nl> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 324370
* shared: Implement aligned vector stores (vstorea_half)Jan Vesely2017-10-222-20/+31
| | | | | | | | | | Float version passes newly posted piglit tests on turks, float and double pass on carrizo. v2: scalar vstorea_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316291
* shared: Implement aligned vector loads (vloada_half)Jan Vesely2017-10-222-10/+26
| | | | | | | | | | Passes newly posted piglits on turks and carrizo v2: add scalar vloada_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316290
* Implement vload_half{,n} and vload(half)Jan Vesely2017-09-082-0/+72
| | | | | | | | | | v2: add vload(half) as well make helpers amdgpu specific (NVPTX uses different private AS numbering) use clang builtin on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312839
* vstore: Cleanup and add vstore(half)Jan Vesely2017-09-082-45/+33
| | | | | | | | | | Add missing undefs Make helpers amdgpu specific (NVPTX uses different numbering for private AS) Use clang builtins on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312838
* Provide vstore_half helper to workaround clc restrictionsJan Vesely2016-09-213-26/+74
| | | | | | clang won't accept half precision loads and stores without cl_khr_fp16 since r281904 llvm-svn: 282106
* Implement vstore_half{,n}Jan Vesely2016-08-172-0/+42
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 278962
* Make min follow the OCL 1.0 specsJan Vesely2016-07-251-2/+2
| | | | | | | | | | | | | OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y are infinite or NaN, the return values are undefined." OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y are infinite or NaN, the return values are undefined." The 1.0 version is stricter so use that one. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276704
* vload/vstore: Use casts instead of scalarizing everything in CLC versionAaron Watry2014-08-204-186/+21
| | | | | | | | | | | | | | | This generates bitcode which is indistinguishable from what was hand-written for int32 types in v[load|store]_impl.ll. v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom) v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll v2: (Per Matt Arsenault) Fix alignment issues with vector load stores Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> CC: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 216069
* Add intN vloadN() implementations for address spaces 3 and 4Aaron Watry2013-08-121-0/+60
| | | | | | | | Not hooked up to R600 yet due to current lack of support, at least on EG. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188181
* Add vload* for addrspace(2) and use as constant load for R600Aaron Watry2013-08-122-2/+34
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188179
* Fix and re-enable R600 vload/vstore assemblyAaron Watry2013-07-162-56/+35
| | | | | | | | | | | | | | | | | | | The assembly optimizations were making unsafe assumptions about which address spaces had which identifiers. Also, fix vload/vstore with 64-bit pointers. This was broken previously on Radeon SI. This version still only has assembly versions of int/uint 2/4/8/16 for global loads and stores on R600, but it does it in a way that would be very easily extended to private/local/constant and could also be handled easily on other architectures. v2: 1) Leave v[load|store]_impl.ll in generic/lib 2) Remove vload_if.ll and vstore_if.ll interfaces 3) Fix address+offset calculations 3) Remove offset from assembly arg list llvm-svn: 186416
* libclc: vload/vstore disable assembly and fix offset calculationAaron Watry2013-07-164-243/+20
| | | | | | | | | | | This commit gets us back to pure CLC and fixes offset calculations. The next commit will re-enable the assembly implementation for R600, fix bugs related to 64-bit address spaces, and also fix the incorrect assumption that address space identifiers are the same in all architectures. llvm-svn: 186415
* Add __CLC_ prefix to all macro definitions in headersTom Stellard2013-07-088-30/+30
| | | | | | | | | | | libclc was defining and undefing GENTYPE and several other macros with common names in its header files. This was preventing applications from defining macros with identical names as command line arguments to the compiler, because the definitions in the header files were masking the macros defined as compiler arguements. Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 185838
* libclc: Add assembly versions of vstore for global [u]int4/8/16Tom Stellard2013-06-263-6/+166
| | | | | | | | | | | | | The assembly should be generic, but at least currently R600 only supports 32-bit stores of [u]int1/4, and I believe that only global is well-supported. R600 lowers the 8/16 component stores to multiple 4-component stores. The unoptimized C versions of the other stuff is left in place. Patch by: Aaron Watry llvm-svn: 185009
* libclc: Add assembly versions of vload for global int4/8/16Tom Stellard2013-06-263-2/+160
| | | | | | | | | | | | | The assembly should be generic, but at least currently R600 only supports 32-bit loads of int1/4, and I believe that only global is well-supported. R600 lowers the 8/16 component vectors to multiple 4-bit loads. The unoptimized C versions of the other stuff is left in place. Patch by: Aaron Watry llvm-svn: 185008
* libclc: Initial vstore implementationTom Stellard2013-06-261-0/+56
| | | | | | | | | | Assumes that the target supports byte-addressable stores. Completely unoptimized. Patch by: Aaron Watry llvm-svn: 185007
* libclc: Initial vload implementationTom Stellard2013-06-261-0/+47
| | | | | | | | Should work for all targets and data types. Completely unoptimized. Patch by: Aaron Watry llvm-svn: 185006
* libclc: Add clamp(vec, scalar, scalar) and max(vec, scalar)Tom Stellard2013-06-262-0/+12
| | | | | | | | | | | For any GENTYPE that isn't scalar, we need to implement a mixed vector/scalar version of clamp/max. This depends on the min() patches I sent to the list a few minutes ago. Patch by: Aaron Watry llvm-svn: 185003
* libclc: Implement the min(vec, scalar) version of the min builtin.Tom Stellard2013-06-261-0/+6
| | | | | | | | | | Checks if the current GENTYPE is scalar, and if not, then defines a separate implementation of the function which casts the second arg to vector before proceeding. Patch by: Aaron Watry llvm-svn: 185002
* libclc: implement initial version of min()Tom Stellard2013-06-262-0/+14
| | | | | | | | This doesn't handle the integer cases for min(vector, scalar). Patch by: Aaron Watry llvm-svn: 185001
* libclc: Move max builtin to shared/Tom Stellard2013-06-262-0/+14
| | | | | | | | Max(x,y) is available for all integer/floating types. Patch by: Aaron Watry llvm-svn: 184995
* libclc: Add clamp() builtin for integer/floating pointTom Stellard2013-06-262-0/+14
Created under a new shared/ directory for functions which are available for both integer and floating point types. Patch by: Aaron Watry llvm-svn: 184994
OpenPOWER on IntegriCloud