[X86] Heuristic to selectively build Newton-Raphson SQRT estimation

On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
author: Nikolai Bozhenov <nikolai.bozhenov@intel.com> 2016-08-04 12:47:28 +0000
committer: Nikolai Bozhenov <nikolai.bozhenov@intel.com> 2016-08-04 12:47:28 +0000
commit: f679530ba18023d29765bde397fa77048bf17985 (patch)
tree: 26d32ee662bbb6f153eb39b81350d1d6859cd044 /llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
parent: 8950cead7f2032d4dee6b17be4eb4c6b5d755403 (diff)
download: bcm5719-llvm-f679530ba18023d29765bde397fa77048bf17985.tar.gz
bcm5719-llvm-f679530ba18023d29765bde397fa77048bf17985.zip
1 files changed, 6 insertions, 2 deletions
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index df928451fc5..5bcea642e01 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -8907,14 +8907,18 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
 }
 
 SDValue DAGCombiner::visitFSQRT(SDNode *N) {
-  if (!DAG.getTarget().Options.UnsafeFPMath || TLI.isFsqrtCheap())
+  if (!DAG.getTarget().Options.UnsafeFPMath)
+    return SDValue();
+
+  SDValue N0 = N->getOperand(0);
+  if (TLI.isFsqrtCheap(N0, DAG))
     return SDValue();
 
   // TODO: FSQRT nodes should have flags that propagate to the created nodes.
   // For now, create a Flags object for use with all unsafe math transforms.
   SDNodeFlags Flags;
   Flags.setUnsafeAlgebra(true);
-  return buildSqrtEstimate(N->getOperand(0), &Flags);
+  return buildSqrtEstimate(N0, &Flags);
 }
 
 /// copysign(x, fp_extend(y)) -> copysign(x, y)
author	Nikolai Bozhenov <nikolai.bozhenov@intel.com>	2016-08-04 12:47:28 +0000
committer	Nikolai Bozhenov <nikolai.bozhenov@intel.com>	2016-08-04 12:47:28 +0000
commit	f679530ba18023d29765bde397fa77048bf17985 (patch)
tree	26d32ee662bbb6f153eb39b81350d1d6859cd044 /llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
parent	8950cead7f2032d4dee6b17be4eb4c6b5d755403 (diff)
download	bcm5719-llvm-f679530ba18023d29765bde397fa77048bf17985.tar.gz bcm5719-llvm-f679530ba18023d29765bde397fa77048bf17985.zip