[AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector - bcm5719-llvm

diff options

author	Sanjay Patel <spatel@rotateright.com>	2015-04-02 20:21:52 +0000
committer	Sanjay Patel <spatel@rotateright.com>	2015-04-02 20:21:52 +0000
commit	eca590ffb3b2f8d73fa93e66dea5b2c380a527df (patch)
tree	dd20212b5674d73ed088d921d641df3e8d83cb09 /libcxx
parent	ff0cf4f56df4177032c6357910ca50d33b77bfcd (diff)
download	bcm5719-llvm-eca590ffb3b2f8d73fa93e66dea5b2c380a527df.tar.gz bcm5719-llvm-eca590ffb3b2f8d73fa93e66dea5b2c380a527df.zip

[AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector

Without this patch, we split the 256-bit vector into halves and produced something like: movzwl (%rdi), %eax vmovd %eax, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vblendps $15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7] Now, we eliminate the xor and blend because those zeros are free with the vmovd: movzwl (%rdi), %eax vmovd %eax, %xmm0 This should be the final fix needed to resolve PR22685: https://llvm.org/bugs/show_bug.cgi?id=22685 llvm-svn: 233941

Diffstat (limited to 'libcxx')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: