[X86] Remove sse41 specific code from lowering v16i8 multiply - bcm5719-llvm

diff options

author	Craig Topper <craig.topper@intel.com>	2018-03-19 17:31:41 +0000
committer	Craig Topper <craig.topper@intel.com>	2018-03-19 17:31:41 +0000
commit	259eaa6e7cd021c75e36da2ea677103a6847eb38 (patch)
tree	8b2e2c1c5876cd31603aa6a0ef31fdf4fa3e0129 /clang/lib/CodeGen/CGBlocks.cpp
parent	634b5baa4ebeff5aaf6a57dae850f3e3272188a3 (diff)
download	bcm5719-llvm-259eaa6e7cd021c75e36da2ea677103a6847eb38.tar.gz bcm5719-llvm-259eaa6e7cd021c75e36da2ea677103a6847eb38.zip

[X86] Remove sse41 specific code from lowering v16i8 multiply

With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that. llvm-svn: 327869

Diffstat (limited to 'clang/lib/CodeGen/CGBlocks.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: