summaryrefslogtreecommitdiffstats
path: root/libclc
Commit message (Collapse)AuthorAgeFilesLines
...
* travis: Enable external function call checks on llvm-{4,5}Jan Vesely2017-10-101-0/+2
| | | | | | Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315342
* Make image builtins r600/llvm-3.9 onlyJan Vesely2017-10-1018-15/+15
| | | | | | | | | | The implementation uses r600 sepcific intrinsics LLVM-4 switched to _ro_t and _rw_t image types Portions of the code can be moved back as more targets/llvm versions add image support Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315341
* Implement mem_fence on ptxJeroen Ketema2017-10-092-0/+16
| | | | | | | | | | | | | PTX does not differentiate between read and write fences. Hence, these a lowered to a mem_fence call. The mem_fence function compiles to the “member.cta” instruction, which commits all outstanding reads and writes of a thread such that these become visible to all other threads in the same CTA (i.e., work-group). The instruction does not differentiate between global and local memory. Hence, the flags parameter is ignored, except for deciding whether a “member.cta” instruction should be issued at all. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315235
* Make ptx barrier work irrespective of the cl_mem_fence_flagsJeroen Ketema2017-10-091-3/+1
| | | | | | | | | | | | This generates a "bar.sync 0” instruction, which not only causes the threads to wait, but does acts as a memory fence, as required by OpenCL. The fence does not differentiate between local and global memory. Unfortunately, there is no similar instruction which does not include a memory fence. Hence, we cannot optimize the case where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is passed. llvm-svn: 315228
* travis: Make sure we report failure even if only earlier checked files failJan Vesely2017-10-081-3/+5
| | | | | | | | | | for loop would only report status of the last command v2: return '1' call test instead of '[' Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315193
* check_external_calls.sh: Print number of calls in tested file.Jan Vesely2017-10-081-2/+3
| | | | | | Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315192
* ptx: Use __clc_nextafter to implement nextafterJan Vesely2017-10-082-0/+11
| | | | | | | | using clang builtin results in external library call Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315191
* Do not include clc_nextafter header globallyJan Vesely2017-10-084-16/+3
| | | | | | | | Drop unused clc/math/clc_nextafter.h header Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315190
* math/nextafter: Use custom declaration inc fileJan Vesely2017-10-082-4/+2
| | | | | | Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315189
* math/binary_decl.inc: Do not declare mixed float/double functionsJan Vesely2017-10-081-5/+1
| | | | | | | | fmin/fmax only need vector/scalar mix Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315188
* ldexp: Fix double precision function return typeJan Vesely2017-10-081-1/+1
| | | | | | | | Fixes ~1200 external calls from nvtpx library. Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315170
* configure: Fix handling of directories with compats only source listsJan Vesely2017-10-051-3/+7
| | | | | | Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315018
* Add vload_half helpers for ptxJeroen Ketema2017-10-054-0/+26
| | | | | | | The removes the vload_half unresolved calls from the nvptx libraries. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314998
* Add vstore_half helpers for ptxJeroen Ketema2017-10-044-0/+38
| | | | | Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314925
* integer/sub_sat: Use clang builtin instead of llvm asmJan Vesely2017-10-027-215/+26
| | | | | | | reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314703
* integer/add_sat: Use clang builtin instead of llvm asmJan Vesely2017-10-027-206/+29
| | | | | | | reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314702
* integer/clz: Use clang builtin instead of llvm asmJan Vesely2017-10-024-119/+8
| | | | | | | | | The generated llvm IR mostly identical. char/uchar case is a bit worse. reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314701
* Let get_work_dim take exactly 0 argumentsJeroen Ketema2017-10-013-3/+3
| | | | | Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314634
* Do no circularly define NULLJeroen Ketema2017-10-011-1/+1
| | | | | Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314633
* Fix amdgcn-amdhsa on llvm-3.9Jan Vesely2017-09-296-4/+77
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314548
* travis: Check built libraries on llvm-3.9Jan Vesely2017-09-291-0/+4
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314547
* Add script to check for unresolved function callsJan Vesely2017-09-291-0/+35
| | | | | | | | | | v2: add shell shebang improve error checks and reporting v3: fix typo Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314546
* geometric: geometric functions are only supported for vector lengths <=4Jan Vesely2017-09-291-16/+0
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314545
* travis: add build using llvm-3.9Jan Vesely2017-09-291-0/+14
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314544
* Restore support for llvm-3.9Jan Vesely2017-09-299-4/+90
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314543
* Add missing HAVE_LLVM define to fix build with latest llvmJan Vesely2017-09-261-1/+2
| | | | | | | | | | | | Broken since r314111 V2: pointed out by Jan Vesely - Use format() instead of % formating Patch-by: Pavel Ondračka <pavel.ondracka@gmail.com> Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314261
* Rework atomic ops to use clang builtins rather than llvm asmJan Vesely2017-09-2513-202/+117
| | | | | | | reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314112
* prepare_builtins: Fix compile breakage with older LLVMJan Vesely2017-09-251-0/+5
| | | | | | | | | Fixes r314050 reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314111
* [Support] Rename tool_output_file to ToolOutputFile, NFCReid Kleckner2017-09-231-2/+2
| | | | | | | This class isn't similar to anything from the STL, so it shouldn't use the STL naming conventions. llvm-svn: 314050
* Implement cl_khr_int64_extended_atomics builtinsJan Vesely2017-09-2014-0/+172
| | | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313811
* Implement cl_khr_int64_base_atomics builtinsJan Vesely2017-09-2014-0/+136
| | | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313810
* Add travis CI configuration fileJan Vesely2017-09-201-0/+42
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 313773
* Add native_recip(x) as ((1)/(x))Aaron Watry2017-09-132-0/+2
| | | | | | Signed-off-by: Aaron Watry <awatry@gmail.com> Acked-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 313107
* integer: Add popcount implementation using ctpop intrinsicAaron Watry2017-09-093-0/+27
| | | | | | | | | | | | | Also copy/modify the unary_intrin.inc from math/ to make the intrinsic declaration somewhat reusable. Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2 Tested-by on GCN 1.0 (Pitcairn) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312854
* Implement vload_half{,n} and vload(half)Jan Vesely2017-09-086-20/+132
| | | | | | | | | | v2: add vload(half) as well make helpers amdgpu specific (NVPTX uses different private AS numbering) use clang builtin on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312839
* vstore: Cleanup and add vstore(half)Jan Vesely2017-09-086-12/+45
| | | | | | | | | | Add missing undefs Make helpers amdgpu specific (NVPTX uses different numbering for private AS) Use clang builtins on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312838
* configure.py: Simplify compatibility sourcesJan Vesely2017-09-081-6/+4
| | | | | | | | Just add the SOURCE_X.Y list to the list of sources if X.Y is the current llvm version. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312837
* amdgcn,waitcnt: Add datalayout infoJan Vesely2017-09-041-0/+2
| | | | | | | | This file is only compiled for GCN which all share the same layout Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 312493
* r600: Cleanup barrier implementation.Jan Vesely2017-09-043-37/+5
| | | | | | | | | We don't have memory fences for r600 so just call group barrier directly Make sure that barrier is called even with 0 flags Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 312492
* Fixup clc.h commentJan Vesely2017-09-041-2/+1
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312491
* relational: Implement shuffle2 builtinAaron Watry2017-09-024-1/+209
| | | | | | | | | | | | | | This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in_dual_input v2: Add half support to shuffle2 Move shuffle2 to misc/ Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312404
* relational: Implement shuffle builtinAaron Watry2017-09-024-0/+209
| | | | | | | | | | | | | | This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in v2: Add half-precision support to shuffle when available. Move to misc/ and add section 6.12.12 to clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312403
* Add halfN types and enable fp16 when generating builtin declarationsAaron Watry2017-09-022-0/+12
| | | | | | | | | Uses the same mechanism to enable fp16 as we use for fp64 when processing clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312402
* amdgcn: rewrite barrier() using fence and clang __builtin_amdgcn_s_barrierJan Vesely2017-08-163-33/+8
| | | | | | | | | | | | | | Specs require using fences when barrier() is invoked: "The barrier function will either flush any variables stored in local memory or queue a memory fence to ensure correct ordering of memory operations to local memory." and "The barrier function will queue a memory fence to ensure correct ordering of memory operations to global memory." Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 311022
* amdgcn: Implement {read_,write_,}mem_fence builtinJan Vesely2017-08-165-0/+58
| | | | | | | | | v2: add more detailed comment about waitcnt instruction Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 311021
* configure.py: Drop explicit import of int builtinJan Vesely2017-08-151-6/+0
| | | | | | | | | | I can't reproduce the error that made me add this. Reported-by: Kim Gräsman <kim.grasman@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Kim Gräsman <kim.grasman@gmail.com> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 310968
* configure.py: Make python3 friendlyJan Vesely2017-08-022-6/+19
| | | | | | | | | | mostly prints and exceptions. Few behavioral changes are documented in the text Generated Makefile is identical between python2 and python3 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 309820
* add __kernel_exec macrosJan Vesely2017-07-284-11/+19
| | | | | | | | also consolidate macros into one file, and rename to clcmacros.h Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 309358
* generic: add missing get_work_dim includeJan Vesely2017-06-021-0/+1
| | | | | | | | Fixes few piglits since clang r304193 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 304556
* math: Implement sinh functionJan Vesely2017-02-255-0/+240
| | | | | | mostly copied form amd_builtins llvm-svn: 296233
OpenPOWER on IntegriCloud