[x86] Add a much more powerful framework for combining x86 shuffle - bcm5719-llvm

diff options

author	Chandler Carruth <chandlerc@gmail.com>	2014-07-27 01:15:58 +0000
committer	Chandler Carruth <chandlerc@gmail.com>	2014-07-27 01:15:58 +0000
commit	80c5bfd8436f9159769b66667019688c681db0cd (patch)
tree	41f06d16f2ac6f4983fa753f7b988118bea9a509 /clang/lib/Serialization/ASTWriter.cpp
parent	3ea985b3758b6b8ca33a54e82d6fedc77ed53f9e (diff)
download	bcm5719-llvm-80c5bfd8436f9159769b66667019688c681db0cd.tar.gz bcm5719-llvm-80c5bfd8436f9159769b66667019688c681db0cd.zip

[x86] Add a much more powerful framework for combining x86 shuffle

instructions in the legalized DAG, and leverage it to combine long sequences of instructions to PSHUFB. Eventually, the other x86-instruction-specific shuffle combines will probably all be driven out of this routine. But the real motivation is to detect after we have fully legalized and optimized a shuffle to the minimal number of x86 instructions whether it is profitable to replace the chain with a fully generic PSHUFB instruction even though doing so requires either a load from a constant pool or tying up a register with the mask. While the Intel manuals claim it should be used when it replaces 5 or more instructions (!!!!) my experience is that it is actually very fast on modern chips, and so I've gon with a much more aggressive model of replacing any sequence of 3 or more instructions. I've also taught it to do some basic canonicalization to special-purpose instructions which have smaller encodings than their generic counterparts. There are still quite a few FIXMEs here, and I've not yet implemented support for lowering blends with PSHUFB (where its power really shines due to being able to zero out lanes), but this starts implementing real PSHUFB support even when using the new, fancy shuffle lowering. =] llvm-svn: 214042

Diffstat (limited to 'clang/lib/Serialization/ASTWriter.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: