Add a note about the inability to model FP -> int conversions which

perform rounding other than truncation in the IR. Common C code for this turns into really an LLVM intrinsic call that blocks a lot of further optimizations. llvm-svn: 123135
author: Chandler Carruth <chandlerc@gmail.com> 2011-01-09 22:36:18 +0000
committer: Chandler Carruth <chandlerc@gmail.com> 2011-01-09 22:36:18 +0000
commit: d011d5317ce2cf9a0b1eaeddb3eae512a2eabfc3 (patch)
tree: e34e51c828c4dfe20e3a1dda4fe75a3ad9fe06c9 /llvm/lib/Target
parent: 96df71ad90b2e59943d0c34acf0cd661dbce050b (diff)
download: bcm5719-llvm-d011d5317ce2cf9a0b1eaeddb3eae512a2eabfc3.tar.gz
bcm5719-llvm-d011d5317ce2cf9a0b1eaeddb3eae512a2eabfc3.zip
1 files changed, 55 insertions, 0 deletions
diff --git a/llvm/lib/Target/README.txt b/llvm/lib/Target/README.txt
index a9afffd95ae..bea240c0921 100644
--- a/llvm/lib/Target/README.txt
+++ b/llvm/lib/Target/README.txt
@@ -2273,3 +2273,58 @@ this whenever the floating point type has enough exponent bits to represent
 the largest integer value as < inf.
 
 //===---------------------------------------------------------------------===//
+
+clang -O3 currently compiles this code:
+
+#include <emmintrin.h>
+int f(double x) { return _mm_cvtsd_si32(_mm_set_sd(x)); }
+int g(double x) { return _mm_cvttsd_si32(_mm_set_sd(x)); }
+
+into
+
+define i32 @_Z1fd(double %x) nounwind readnone {
+entry:
+  %vecinit.i = insertelement <2 x double> undef, double %x, i32 0
+  %vecinit1.i = insertelement <2 x double> %vecinit.i, double 0.000000e+00, i32 1
+  %0 = tail call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> %vecinit1.i) nounwind
+  ret i32 %0
+}
+
+define i32 @_Z1gd(double %x) nounwind readnone {
+entry:
+  %conv.i = fptosi double %x to i32
+  ret i32 %conv.i
+}
+
+This difference carries over to the assmebly produced, resulting in:
+
+_Z1fd:                                  # @_Z1fd
+# BB#0:                                 # %entry
+        pushq   %rbp
+        movq    %rsp, %rbp
+        xorps   %xmm1, %xmm1
+        movsd   %xmm0, %xmm1
+        cvtsd2sil       %xmm1, %eax
+        popq    %rbp
+        ret
+
+_Z1gd:                                  # @_Z1gd
+# BB#0:                                 # %entry
+        pushq   %rbp
+        movq    %rsp, %rbp
+        cvttsd2si       %xmm0, %eax
+        popq    %rbp
+        ret
+
+The problem is that we can't see through the intrinsic call used for cvtsd2si,
+and fold away the unnecessary manipulation of the function parameter. When
+these functions are inlined, it forms a barrier preventing many further
+optimizations. LLVM IR doesn't have a good way to model the logic of
+'cvtsd2si', its only FP -> int conversion path forces truncation. We should add
+a rounding flag onto fptosi so that it can represent this type of rounding
+naturally in the IR rather than using intrinsics. We might need to use a
+'system_rounding_mode' flag to encode that the semantics of the rounding mode
+can be changed by the program, but ideally we could just say that isn't
+supported, and hard code the rounding.
+
+//===---------------------------------------------------------------------===//
author	Chandler Carruth <chandlerc@gmail.com>	2011-01-09 22:36:18 +0000
committer	Chandler Carruth <chandlerc@gmail.com>	2011-01-09 22:36:18 +0000
commit	d011d5317ce2cf9a0b1eaeddb3eae512a2eabfc3 (patch)
tree	e34e51c828c4dfe20e3a1dda4fe75a3ad9fe06c9 /llvm/lib/Target
parent	96df71ad90b2e59943d0c34acf0cd661dbce050b (diff)
download	bcm5719-llvm-d011d5317ce2cf9a0b1eaeddb3eae512a2eabfc3.tar.gz bcm5719-llvm-d011d5317ce2cf9a0b1eaeddb3eae512a2eabfc3.zip