| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.
reviewers: jvesely
Patch by: Vedran Miletić <rivanvx@gmail.com>
llvm-svn: 268766
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.
Passes piglit (float) tests on pitcairn.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.
This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.
v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 261639
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.
The double scalar is also a direct port from AMD's builtin release.
The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.
Both have been tested in piglit using tests sent to that project's
mailing list.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
|
|
|
|
|
|
|
|
| |
V2: use the reference implementation as suggested by Matt Arsenault
Patch By: Pavel Ondračka
llvm-svn: 258933
|
|
|
|
|
|
| |
This is a port from the AMD builtin library.
llvm-svn: 248780
|
|
|
|
|
|
| |
We need to use M_LOG2E instead of M_LOG2E_F.
llvm-svn: 243132
|
|
|
|
|
|
|
|
|
| |
Use the implementation was ported from the AMD builtin library rather
than LLVM Intrinsics.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 243131
|
|
|
|
| |
llvm-svn: 243130
|
|
|
|
|
|
|
|
|
|
|
| |
Passing values less than 0 to the llvm.sqrt() intrinsic results in
undefined behavior, so we need to check the input and return NaN if
is is less than 0.
v2:
- Fix build failures.
llvm-svn: 241906
|
|
|
|
|
|
|
|
|
|
| |
Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.
This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 237228
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 237138
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236941
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.
v2:
- Alphabettize SOURCES
v3 (Jan Vesely):
Limit to single precision types.
llvm-svn: 236915
|
|
|
|
|
|
|
|
| |
Ported from AMD builtin library, passes piglit on Turks.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236647
|
|
|
|
| |
llvm-svn: 236638
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 235620
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 234324
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 234323
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233928
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233927
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233926
|
|
|
|
|
|
|
| |
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233925
|
|
|
|
|
|
|
|
| |
This ensures correct handling of NaNi.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233713
|
|
|
|
|
|
|
|
| |
This ensures correct handling of NaN.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233712
|
|
|
|
|
|
|
|
|
|
| |
This is a generic implementation which just calls sqrt. Targets should
override this if they want a faster implementation.
v2:
- Alphabetize SOURCES
llvm-svn: 232965
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 232674
|
|
|
|
|
|
|
|
| |
It has been part of the common functions since 1.0
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ported from the libclc/amd-builtins branch
v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
Add cospi(double) implementation instead of using llvm.cos
Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
|
|
|
|
|
|
|
|
| |
v2: Use constant and multiplication instead of division
v3: Use hex constants
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
|
|
|
|
| |
llvm-svn: 220678
|
|
|
|
| |
llvm-svn: 219230
|
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))
An alternative is:
tan(x) = sin(x) / cos(x)
Which produces more verbose bitcode and longer assembly.
Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
asin(x) = atan2(x, sqrt( 1-x^2 ))
alternatively:
asin(x) = PI/2 - acos(x)
Use the atan2 implementation since it produces slightly shorter bitcode and
R600 machine code.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217510
|
|
|
|
|
|
|
|
|
|
| |
Passes the tests that were submitted to the piglit list
Tested on R600 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
|
|
|
|
|
|
|
|
|
| |
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().
Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
|
|
|
|
|
|
| |
This double version still uses @llvm.sin.
llvm-svn: 213762
|
|
|
|
|
|
| |
The double version still uses @llvm.cos.
llvm-svn: 213761
|
|
|
|
| |
llvm-svn: 213760
|
|
|
|
| |
llvm-svn: 213759
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211680
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211211
|
|
|
|
|
|
| |
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211047
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use separate implementations instead of a macro
to ensure the constant multiplied with is of
higher precision.
v2: Use the correct formula, spotted by Dan Liew <daniel.liew@imperial.ac.uk>
Reviewed-by: Aaron Warty <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 210891
|
|
|
|
|
|
|
| |
Patch by: Jeroen Ketema
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 204478
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenCL C lang says that trunc rounds towards zero.
llvm.trunc.* intrinsic rounds to integer not larger in magnitude.
These definitions are equivalent.
Patch by: Jan Vesely
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 197769
|
|
|
|
| |
llvm-svn: 195022
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter. Clang replaces this builtin with
a call to nextafter which is part of libm. Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).
2. The other implementation is written in OpenCL C. This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.
llvm-svn: 192383
|
|
|
|
| |
llvm-svn: 188130
|