| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewer: Aaron Watry
llvm-svn: 346078
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewer: Aaron Watry
llvm-svn: 346077
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewer: Aaron Watry
llvm-svn: 346076
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewer: Aaron Watry
llvm-svn: 346075
|
|
|
|
|
|
|
|
|
| |
Same reason as amdgcn.
Fixes fmin, minmag CTS on turks.
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 334228
|
|
|
|
|
|
|
|
|
| |
Same reason as amdgcn.
Fixes fmax, maxmag CTS on turks.
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 334227
|
|
|
|
| |
llvm-svn: 279723
|
|
|
|
| |
llvm-svn: 279644
|
|
|
|
| |
llvm-svn: 279350
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.
v2: split to a separate patch
v3: Switch R600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
|
|
|
|
|
|
|
|
|
|
|
| |
v2: split into 2 patches
use clang builtins for other intrinsics as well
v3: Fix warnings
Switch r600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
|
|
|
|
| |
llvm-svn: 261042
|
|
|
|
|
|
|
|
|
|
|
| |
Most files remain in a common amdgpu directory.
Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.
This now requires 3.9/trunk to build amdgcn.
llvm-svn: 260777
|
|
|
|
|
|
| |
Patch by: Zoltan Gilian
llvm-svn: 248161
|
|
|
|
|
|
| |
Patch by: Zoltan Gilian
llvm-svn: 248160
|
|
|
|
|
|
|
|
|
| |
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.
Patch by: Zoltan Gilian
llvm-svn: 248159
|
|
|
|
|
|
|
| |
v2:
- Use same implementation for R600 and gcn.
llvm-svn: 241907
|
|
|
|
| |
llvm-svn: 236638
|
|
|
|
|
|
|
|
|
|
| |
v2: Fix function declaration
Add range metadata to r600 implementation
v3: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.
v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
|
|
|
|
|
|
|
|
| |
This will prevent LLVM optimization passes from creating illegal uses
of the barrier() intrinsic (e.g. calling barrier() from a conditional
that is not executed by all threads).
llvm-svn: 193753
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter. Clang replaces this builtin with
a call to nextafter which is part of libm. Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).
2. The other implementation is written in OpenCL C. This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.
llvm-svn: 192383
|
|
|
|
|
| |
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 190058
|
|
|
|
|
|
|
|
|
| |
The get_num_groups function was missing for r600g. I did the same
thing as the other workitem functions.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 187059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.
Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.
This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.
v2: 1) Leave v[load|store]_impl.ll in generic/lib
2) Remove vload_if.ll and vstore_if.ll interfaces
3) Fix address+offset calculations
3) Remove offset from assembly arg list
llvm-svn: 186416
|
|
|
|
|
|
| |
Reviewed and Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185837
|
|
|
|
|
|
| |
This allows libclc to be built for R600 with upstream clang and LLVM.
llvm-svn: 184980
|
|
|
|
| |
llvm-svn: 184977
|
|
This includes a get_global_id() implementation and function stubs for
the other workitem and synchronization functions.
llvm-svn: 184975
|