| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315342
|
|
|
|
|
|
|
|
|
|
| |
The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
|
|
|
|
|
|
|
|
|
|
|
|
| |
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory. Unfortunately, there is no similar instruction which does
not include a memory fence. Hence, we cannot optimize the case
where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is
passed.
llvm-svn: 315228
|
|
|
|
|
|
|
|
|
|
| |
for loop would only report status of the last command
v2: return '1'
call test instead of '['
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315193
|
|
|
|
|
|
| |
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315192
|
|
|
|
|
|
|
|
| |
using clang builtin results in external library call
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315191
|
|
|
|
|
|
|
|
| |
Drop unused clc/math/clc_nextafter.h header
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315190
|
|
|
|
|
|
| |
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315189
|
|
|
|
|
|
|
|
| |
fmin/fmax only need vector/scalar mix
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315188
|
|
|
|
|
|
|
|
| |
Fixes ~1200 external calls from nvtpx library.
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315170
|
|
|
|
|
|
| |
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315018
|
|
|
|
|
|
|
| |
The removes the vload_half unresolved calls from the nvptx libraries.
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314998
|
|
|
|
|
| |
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314925
|
|
|
|
|
|
|
| |
reviewer: Tom Stellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314703
|
|
|
|
|
|
|
| |
reviewer: Tom Stellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314702
|
|
|
|
|
|
|
|
|
| |
The generated llvm IR mostly identical. char/uchar case is a bit worse.
reviewer: Tom Stellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314701
|
|
|
|
|
| |
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314634
|
|
|
|
|
| |
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314633
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314548
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314547
|
|
|
|
|
|
|
|
|
|
| |
v2: add shell shebang
improve error checks and reporting
v3: fix typo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314546
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314545
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314544
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314543
|
|
|
|
|
|
|
|
|
|
|
|
| |
Broken since r314111
V2: pointed out by Jan Vesely
- Use format() instead of % formating
Patch-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314261
|
|
|
|
|
|
|
| |
reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314112
|
|
|
|
|
|
|
|
|
| |
Fixes r314050
reviewer: Tom Stellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314111
|
|
|
|
|
|
|
| |
This class isn't similar to anything from the STL, so it shouldn't use
the STL naming conventions.
llvm-svn: 314050
|
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313811
|
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313810
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313773
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Acked-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313107
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also copy/modify the unary_intrin.inc from math/ to make the
intrinsic declaration somewhat reusable.
Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2
Tested-by on GCN 1.0 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312854
|
|
|
|
|
|
|
|
|
|
| |
v2: add vload(half) as well
make helpers amdgpu specific (NVPTX uses different private AS numbering)
use clang builtin on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
|
|
|
|
|
|
|
|
|
|
| |
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
|
|
|
|
|
|
|
|
| |
Just add the SOURCE_X.Y list to the list of sources if X.Y is the current llvm version.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312837
|
|
|
|
|
|
|
|
| |
This file is only compiled for GCN which all share the same layout
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312493
|
|
|
|
|
|
|
|
|
| |
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312491
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added in CL 1.1
Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input
v2: Add half support to shuffle2
Move shuffle2 to misc/
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added in CL 1.1
Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in
v2: Add half-precision support to shuffle when available.
Move to misc/ and add section 6.12.12 to clc.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
|
|
|
|
|
|
|
|
|
| |
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specs require using fences when barrier() is invoked:
"The barrier function will either flush any variables stored in local memory
or queue a memory fence to ensure correct ordering of memory operations to local memory."
and
"The barrier function will queue a memory fence to ensure correct ordering
of memory operations to global memory."
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311022
|
|
|
|
|
|
|
|
|
| |
v2: add more detailed comment about waitcnt instruction
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311021
|
|
|
|
|
|
|
|
|
|
| |
I can't reproduce the error that made me add this.
Reported-by: Kim Gräsman <kim.grasman@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Kim Gräsman <kim.grasman@gmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 310968
|
|
|
|
|
|
|
|
|
|
| |
mostly prints and exceptions.
Few behavioral changes are documented in the text
Generated Makefile is identical between python2 and python3
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309820
|
|
|
|
|
|
|
|
| |
also consolidate macros into one file, and rename to clcmacros.h
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309358
|
|
|
|
|
|
|
|
| |
Fixes few piglits since clang r304193
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 304556
|
|
|
|
|
|
| |
mostly copied form amd_builtins
llvm-svn: 296233
|