| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Fix typo in smoothstep.h
llvm-svn: 230969
|
|
|
|
|
|
|
|
|
| |
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Move to the common/ directory
llvm-svn: 230968
|
|
|
|
|
|
|
|
|
| |
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Move to the common/ directory
llvm-svn: 230967
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ported from the libclc/amd-builtins branch
v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
Add cospi(double) implementation instead of using llvm.cos
Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
|
|
|
|
|
|
|
|
| |
v2: Use constant and multiplication instead of division
v3: Use hex constants
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
|
|
|
|
| |
llvm-svn: 219230
|
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple implementation which just copies data synchronously.
v2:
- Use size_t.
v3:
- Fix possible race condition by splitting the copy among multiple
work items.
llvm-svn: 219008
|
|
|
|
|
|
|
|
|
| |
This is a simple implementation which just copies data synchronously.
v2:
- Use size_t.
llvm-svn: 219007
|
|
|
|
|
|
|
|
|
| |
This is a simple default implemetation which just calls barrier().
v2:
- Only call barrier() once.
llvm-svn: 219006
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
|
|
|
|
|
|
|
|
| |
Not used yet.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217913
|
|
|
|
|
|
|
|
|
|
| |
Not used yet...
v2: Correct int/uint behavior
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217912
|
|
|
|
|
|
|
|
| |
We were missing the local versions of the atom_* before
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))
An alternative is:
tan(x) = sin(x) / cos(x)
Which produces more verbose bitcode and longer assembly.
Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
asin(x) = atan2(x, sqrt( 1-x^2 ))
alternatively:
asin(x) = PI/2 - acos(x)
Use the atan2 implementation since it produces slightly shorter bitcode and
R600 machine code.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217510
|
|
|
|
|
|
|
|
|
|
| |
Passes the tests that were submitted to the piglit list
Tested on R600 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217247
|
|
|
|
|
|
|
|
| |
v2: remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217246
|
|
|
|
|
|
|
|
| |
v2: remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217245
|
|
|
|
|
|
|
|
|
| |
v2: simplify and remove isnan leftovers
remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217244
|
|
|
|
|
|
|
|
|
| |
v2: simplify and remove isinf leftovers
remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217243
|
|
|
|
| |
llvm-svn: 217046
|
|
|
|
|
|
|
|
|
| |
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().
Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Fix trailing whitespace
Fix signed long overflow
improve comment
v3: fix typo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 216923
|
|
|
|
|
|
|
|
| |
This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564.
I didn't mean to commit this... Jan has a v3 incoming
llvm-svn: 216322
|
|
|
|
|
|
|
|
|
| |
v2: Fix trailing whitespace
Fix signed long overflow
improve comment
Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
llvm-svn: 216320
|
|
|
|
|
|
|
| |
The default implementation is a no-op. Targets should override this
with their own implementations.
llvm-svn: 216127
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.
v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
|
|
|
|
|
|
|
|
| |
v2: remove the initial undef
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 214568
|
|
|
|
|
|
|
|
| |
v2: remove the initial undef
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 214567
|
|
|
|
|
|
| |
This double version still uses @llvm.sin.
llvm-svn: 213762
|
|
|
|
|
|
| |
The double version still uses @llvm.cos.
llvm-svn: 213761
|
|
|
|
| |
llvm-svn: 213760
|
|
|
|
| |
llvm-svn: 213759
|
|
|
|
|
|
|
|
| |
v2: Use relational macros instead of hand-rolled ones
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213320
|
|
|
|
|
|
|
|
| |
v2: Use relational macros instead of hand-rolled macros
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213319
|
|
|
|
|
|
|
|
| |
v2: Use relational macros instead of hand-rolled macros
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213318
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213317
|
|
|
|
|
|
|
|
|
| |
Vector true is -1, not 1, which means we need to use the relational unary
macro instead of the normal unary builtin one.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213316
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
relational.h includes relational macros for defining functions which need to
return 1 for scalar true and -1 for vector true.
I believe that this is the only place that this behavior is required, so the
macro is placed at its lowest useful level (same directory as it is used in).
This also creates re-usable unary/binary declaration and floatn includes which
should simplify relational builtin declarations.
Mostly patterned off of include/math/[binary_decl|unary_decl|floatn].inc
but with required changes for relational functions.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213315
|
|
|
|
|
|
|
|
|
| |
The vector components were mistakenly using () instead of {}, which caused
all but the last vector component to be dropped on the floor.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 211733
|
|
|
|
|
|
|
|
|
|
|
| |
v2 Changes:
- use __builtin_signbit instead of shifting by hand
- significantly improve vector shuffling
- Works correctly now for signbit(float16) on radeonsi
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 211696
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211680
|
|
|
|
|
|
|
|
|
| |
v2: - use quotes instead of <>
- add include to r600/lib/math/nextafter.c changed
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211576
|
|
|
|
|
|
|
| |
Also change the order of the functions to be consistent with
the order in the header files.
llvm-svn: 211496
|