[AMDGPU] Improve reciprocal handling - bcm5719-llvm

diff options

author	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2018-06-06 22:22:32 +0000
committer	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2018-06-06 22:22:32 +0000
commit	df61be70b23460574d7a2875c897643daada9eee (patch)
tree	860c6a57a60bbdb0f3ebb880eaeeecf8308fe27a /lldb/packages/Python/lldbsuite/test/python_api/thread/TestThreadAPI.py
parent	566b402a131c7a045bb5d4cef8e3131440d5aae2 (diff)
download	bcm5719-llvm-df61be70b23460574d7a2875c897643daada9eee.tar.gz bcm5719-llvm-df61be70b23460574d7a2875c897643daada9eee.zip

[AMDGPU] Improve reciprocal handling

When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x *= s; return s * v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142

Diffstat (limited to 'lldb/packages/Python/lldbsuite/test/python_api/thread/TestThreadAPI.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: