| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: fix fmax implementation
use consistent checks for __CLC_FP_SIZE
add missing TODOs
fix whitespace in definitions.h
v3: undef ZERO in modf.inc
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 332677
|
|
|
|
|
|
|
|
| |
This will make adding cl_khr_fp16 support easier
Reviewed-by: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 326816
|
|
|
|
|
|
|
|
| |
Passes CTS on carrizo
Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 324376
|
|
|
|
|
|
|
|
| |
Passes CTS on carrizo
Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 324375
|
|
|
|
|
|
|
|
| |
Passes CTS on carrizo
Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 324374
|
|
|
|
|
|
|
|
| |
Passes CTS on carrizo
Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 324373
|
|
|
|
|
|
|
|
|
| |
Add another layer of indirection
This will be used for specific rounding modes
Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 324371
|
|
|
|
|
|
| |
Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 324370
|
|
|
|
|
|
|
|
|
|
| |
Float version passes newly posted piglit tests on turks, float and double pass on carrizo.
v2: scalar vstorea_half
v3: fix typo
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316291
|
|
|
|
|
|
|
|
|
|
| |
Passes newly posted piglits on turks and carrizo
v2: add scalar vloada_half
v3: fix typo
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316290
|
|
|
|
|
|
|
|
|
|
| |
v2: add vload(half) as well
make helpers amdgpu specific (NVPTX uses different private AS numbering)
use clang builtin on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
|
|
|
|
|
|
|
|
|
|
| |
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
|
|
|
|
|
|
| |
clang won't accept half precision loads and stores without cl_khr_fp16 since r281904
llvm-svn: 282106
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 278962
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y
are infinite or NaN, the return values are undefined."
OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y
are infinite or NaN, the return values are undefined."
The 1.0 version is stricter so use that one.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276704
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.
v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
|
|
|
|
|
|
|
|
| |
Not hooked up to R600 yet due to current lack of support, at least on EG.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188181
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188179
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.
Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.
This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.
v2: 1) Leave v[load|store]_impl.ll in generic/lib
2) Remove vload_if.ll and vstore_if.ll interfaces
3) Fix address+offset calculations
3) Remove offset from assembly arg list
llvm-svn: 186416
|
|
|
|
|
|
|
|
|
|
|
| |
This commit gets us back to pure CLC and fixes offset calculations.
The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.
llvm-svn: 186415
|
|
|
|
|
|
|
|
|
|
|
| |
libclc was defining and undefing GENTYPE and several other macros with
common names in its header files. This was preventing applications from
defining macros with identical names as command line arguments to the
compiler, because the definitions in the header files were masking the
macros defined as compiler arguements.
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185838
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assembly should be generic, but at least currently R600 only supports
32-bit stores of [u]int1/4, and I believe that only global is well-supported.
R600 lowers the 8/16 component stores to multiple 4-component stores.
The unoptimized C versions of the other stuff is left in place.
Patch by: Aaron Watry
llvm-svn: 185009
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assembly should be generic, but at least currently R600 only supports
32-bit loads of int1/4, and I believe that only global is well-supported.
R600 lowers the 8/16 component vectors to multiple 4-bit loads.
The unoptimized C versions of the other stuff is left in place.
Patch by: Aaron Watry
llvm-svn: 185008
|
|
|
|
|
|
|
|
|
|
| |
Assumes that the target supports byte-addressable stores.
Completely unoptimized.
Patch by: Aaron Watry
llvm-svn: 185007
|
|
|
|
|
|
|
|
| |
Should work for all targets and data types. Completely unoptimized.
Patch by: Aaron Watry
llvm-svn: 185006
|
|
|
|
|
|
|
|
|
|
|
| |
For any GENTYPE that isn't scalar, we need to implement a mixed
vector/scalar version of clamp/max.
This depends on the min() patches I sent to the list a few minutes ago.
Patch by: Aaron Watry
llvm-svn: 185003
|
|
|
|
|
|
|
|
|
|
| |
Checks if the current GENTYPE is scalar, and if not, then defines a separate
implementation of the function which casts the second arg to vector before
proceeding.
Patch by: Aaron Watry
llvm-svn: 185002
|
|
|
|
|
|
|
|
| |
This doesn't handle the integer cases for min(vector, scalar).
Patch by: Aaron Watry
llvm-svn: 185001
|
|
|
|
|
|
|
|
| |
Max(x,y) is available for all integer/floating types.
Patch by: Aaron Watry
llvm-svn: 184995
|
|
Created under a new shared/ directory for functions which are available for
both integer and floating point types.
Patch by: Aaron Watry
llvm-svn: 184994
|