summaryrefslogtreecommitdiffstats
path: root/llvm/lib
diff options
context:
space:
mode:
authorBenjamin Kramer <benny.kra@googlemail.com>2012-03-19 00:43:34 +0000
committerBenjamin Kramer <benny.kra@googlemail.com>2012-03-19 00:43:34 +0000
commit57003a6768146c20befe5fc24d8a9b9bcb7ca0fd (patch)
tree20c63352044b56a9108f1a0258a3dc254f44ede0 /llvm/lib
parent93f2c7b58439fcb70f1c03f5edfb11fc6fb3d3aa (diff)
downloadbcm5719-llvm-57003a6768146c20befe5fc24d8a9b9bcb7ca0fd.tar.gz
bcm5719-llvm-57003a6768146c20befe5fc24d8a9b9bcb7ca0fd.zip
Add a note for -ffast-math optimization of vector norm.
llvm-svn: 153031
Diffstat (limited to 'llvm/lib')
-rw-r--r--llvm/lib/Target/X86/README-SSE.txt19
1 files changed, 19 insertions, 0 deletions
diff --git a/llvm/lib/Target/X86/README-SSE.txt b/llvm/lib/Target/X86/README-SSE.txt
index a581993c3c6..624e56fa0f6 100644
--- a/llvm/lib/Target/X86/README-SSE.txt
+++ b/llvm/lib/Target/X86/README-SSE.txt
@@ -922,3 +922,22 @@ _test2: ## @test2
The insertps's of $0 are pointless complex copies.
//===---------------------------------------------------------------------===//
+
+[UNSAFE FP]
+
+void foo(double, double, double);
+void norm(double x, double y, double z) {
+ double scale = __builtin_sqrt(x*x + y*y + z*z);
+ foo(x/scale, y/scale, z/scale);
+}
+
+We currently generate an sqrtsd and 3 divsd instructions. This is bad, fp div is
+slow and not pipelined. In -ffast-math mode we could compute "1.0/scale" first
+and emit 3 mulsd in place of the divs. This can be done as a target-independent
+transform.
+
+If we're dealing with floats instead of doubles we could even replace the sqrtss
+and inversion with an rsqrtss instruction, which computes 1/sqrt faster at the
+cost of reduced accuracy.
+
+//===---------------------------------------------------------------------===//
OpenPOWER on IntegriCloud