diff options
| author | Chris Lattner <sabre@nondot.org> | 2009-02-04 19:08:01 +0000 |
|---|---|---|
| committer | Chris Lattner <sabre@nondot.org> | 2009-02-04 19:08:01 +0000 |
| commit | 553fd7e1ebababeb1c22526580c406eaef213d98 (patch) | |
| tree | 54575731253ce88b9634b1e3aa53402ee87b39e3 /llvm/lib/Target | |
| parent | ded2d7b0211a96f70392e829b399546453dff40e (diff) | |
| download | bcm5719-llvm-553fd7e1ebababeb1c22526580c406eaef213d98.tar.gz bcm5719-llvm-553fd7e1ebababeb1c22526580c406eaef213d98.zip | |
add a note, this is why we're faster at SciMark-MonteCarlo with
SSE disabled.
llvm-svn: 63751
Diffstat (limited to 'llvm/lib/Target')
| -rw-r--r-- | llvm/lib/Target/X86/README-SSE.txt | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/llvm/lib/Target/X86/README-SSE.txt b/llvm/lib/Target/X86/README-SSE.txt index bc51b534824..67cad42a354 100644 --- a/llvm/lib/Target/X86/README-SSE.txt +++ b/llvm/lib/Target/X86/README-SSE.txt @@ -912,3 +912,43 @@ since we know the stack slot is already zext'd. Consider using movlps instead of movsd to implement (scalar_to_vector (loadf64)) when code size is critical. movlps is slower than movsd on core2 but it's one byte shorter. + +//===---------------------------------------------------------------------===// + +We should use a dynamic programming based approach to tell when using FPStack +operations is cheaper than SSE. SciMark montecarlo contains code like this +for example: + +double MonteCarlo_num_flops(int Num_samples) { + return ((double) Num_samples)* 4.0; +} + +In fpstack mode, this compiles into: + +LCPI1_0: + .long 1082130432 ## float 4.000000e+00 +_MonteCarlo_num_flops: + subl $4, %esp + movl 8(%esp), %eax + movl %eax, (%esp) + fildl (%esp) + fmuls LCPI1_0 + addl $4, %esp + ret + +in SSE mode, it compiles into significantly slower code: + +_MonteCarlo_num_flops: + subl $12, %esp + cvtsi2sd 16(%esp), %xmm0 + mulsd LCPI1_0, %xmm0 + movsd %xmm0, (%esp) + fldl (%esp) + addl $12, %esp + ret + +There are also other cases in scimark where using fpstack is better, it is +cheaper to do fld1 than load from a constant pool for example, so +"load, add 1.0, store" is better done in the fp stack, etc. + +//===---------------------------------------------------------------------===// |

