Revamp build_vector lowering to take advantage of movss and movd instructions. - bcm5719-llvm

diff options

author	Evan Cheng <evan.cheng@apple.com>	2006-04-21 23:03:30 +0000
committer	Evan Cheng <evan.cheng@apple.com>	2006-04-21 23:03:30 +0000
commit	14215c36b655b6cd5736c27f21e1a1d075a3fa3d (patch)
tree	fb759d57ee3067331b83ccf5e0a4ce056730540b /llvm/lib/CodeGen/MachineCodeEmitter.cpp
parent	57a32f0bc1b9c555fee90a7132007178a29ca150 (diff)
download	bcm5719-llvm-14215c36b655b6cd5736c27f21e1a1d075a3fa3d.tar.gz bcm5719-llvm-14215c36b655b6cd5736c27f21e1a1d075a3fa3d.zip

Revamp build_vector lowering to take advantage of movss and movd instructions.

movd always clear the top 96 bits and movss does so when it's loading the value from memory. The net result is codegen for 4-wide shuffles is much improved. It is near optimal if one or more elements is a zero. e.g. __m128i test(int a, int b) { return _mm_set_epi32(0, 0, b, a); } compiles to _test: movd 8(%esp), %xmm1 movd 4(%esp), %xmm0 punpckldq %xmm1, %xmm0 ret compare to gcc: _test: subl $12, %esp movd 20(%esp), %xmm0 movd 16(%esp), %xmm1 punpckldq %xmm0, %xmm1 movq %xmm1, %xmm0 movhps LC0, %xmm0 addl $12, %esp ret or icc: _test: movd 4(%esp), %xmm0 #5.10 movd 8(%esp), %xmm3 #5.10 xorl %eax, %eax #5.10 movd %eax, %xmm1 #5.10 punpckldq %xmm1, %xmm0 #5.10 movd %eax, %xmm2 #5.10 punpckldq %xmm2, %xmm3 #5.10 punpckldq %xmm3, %xmm0 #5.10 ret #5.10 There are still room for improvement, for example the FP variant of the above example: __m128 test(float a, float b) { return _mm_set_ps(0.0, 0.0, b, a); } _test: movss 8(%esp), %xmm1 movss 4(%esp), %xmm0 unpcklps %xmm1, %xmm0 xorps %xmm1, %xmm1 movlhps %xmm1, %xmm0 ret The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle. llvm-svn: 27939

Diffstat (limited to 'llvm/lib/CodeGen/MachineCodeEmitter.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: