diff options
author | Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> | 2018-06-06 22:22:32 +0000 |
---|---|---|
committer | Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> | 2018-06-06 22:22:32 +0000 |
commit | df61be70b23460574d7a2875c897643daada9eee (patch) | |
tree | 860c6a57a60bbdb0f3ebb880eaeeecf8308fe27a /lldb/packages/Python/lldbsuite/test/python_api/sbstructureddata/TestStructuredDataAPI.py | |
parent | 566b402a131c7a045bb5d4cef8e3131440d5aae2 (diff) | |
download | bcm5719-llvm-df61be70b23460574d7a2875c897643daada9eee.tar.gz bcm5719-llvm-df61be70b23460574d7a2875c897643daada9eee.zip |
[AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:
bool c = fabs(x) > 0x1.0p+96f;
float s = c ? 0x1.0p-32f : 1.0f;
x *= s;
return s * v_rcp_f32(x)
in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.
The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.
OpenCL conformance passed with both enabled and disabled denorms.
Differential Revision: https://reviews.llvm.org/D47805
llvm-svn: 334142
Diffstat (limited to 'lldb/packages/Python/lldbsuite/test/python_api/sbstructureddata/TestStructuredDataAPI.py')
0 files changed, 0 insertions, 0 deletions