Use PPC reciprocal estimates with Newton iteration in fast-math mode

When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617
author: Hal Finkel <hfinkel@anl.gov> 2013-04-03 04:01:11 +0000
committer: Hal Finkel <hfinkel@anl.gov> 2013-04-03 04:01:11 +0000
commit: 2e10331057986a3a35f90a898bf85e279c9f7276 (patch)
tree: db1f89dba5d9184c48d6931ae129932975760c7f /llvm/lib/Target/PowerPC/PPCISelLowering.h
parent: 05ba2a055df64a63a5d3c130e86e62b851343e8c (diff)
download: bcm5719-llvm-2e10331057986a3a35f90a898bf85e279c9f7276.tar.gz
bcm5719-llvm-2e10331057986a3a35f90a898bf85e279c9f7276.zip
1 files changed, 7 insertions, 0 deletions
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 6690899e5d2..2ddcb25a23e 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -49,6 +49,9 @@ namespace llvm {
       /// unsigned integers.
       FCTIDUZ, FCTIWUZ,
 
+      /// Reciprocal estimate instructions (unary FP ops).
+      FRE, FRSQRTE,
+
       // VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking
       // three v4f32 operands and producing a v4f32 result.
       VMADDFP, VNMSUBFP,
@@ -620,6 +623,10 @@ namespace llvm {
 
     SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;
     SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;
+
+    SDValue DAGCombineFastRecip(SDNode *N, DAGCombinerInfo &DCI,
+                                bool UseOperand = true) const;
+    SDValue DAGCombineFastRecipFSQRT(SDNode *N, DAGCombinerInfo &DCI) const;
   };
 }
author	Hal Finkel <hfinkel@anl.gov>	2013-04-03 04:01:11 +0000
committer	Hal Finkel <hfinkel@anl.gov>	2013-04-03 04:01:11 +0000
commit	2e10331057986a3a35f90a898bf85e279c9f7276 (patch)
tree	db1f89dba5d9184c48d6931ae129932975760c7f /llvm/lib/Target/PowerPC/PPCISelLowering.h
parent	05ba2a055df64a63a5d3c130e86e62b851343e8c (diff)
download	bcm5719-llvm-2e10331057986a3a35f90a898bf85e279c9f7276.tar.gz bcm5719-llvm-2e10331057986a3a35f90a898bf85e279c9f7276.zip