summaryrefslogtreecommitdiffstats
path: root/clang/lib/Serialization/ASTWriter.cpp
diff options
context:
space:
mode:
authorChandler Carruth <chandlerc@gmail.com>2014-07-27 01:15:58 +0000
committerChandler Carruth <chandlerc@gmail.com>2014-07-27 01:15:58 +0000
commit80c5bfd8436f9159769b66667019688c681db0cd (patch)
tree41f06d16f2ac6f4983fa753f7b988118bea9a509 /clang/lib/Serialization/ASTWriter.cpp
parent3ea985b3758b6b8ca33a54e82d6fedc77ed53f9e (diff)
downloadbcm5719-llvm-80c5bfd8436f9159769b66667019688c681db0cd.tar.gz
bcm5719-llvm-80c5bfd8436f9159769b66667019688c681db0cd.zip
[x86] Add a much more powerful framework for combining x86 shuffle
instructions in the legalized DAG, and leverage it to combine long sequences of instructions to PSHUFB. Eventually, the other x86-instruction-specific shuffle combines will probably all be driven out of this routine. But the real motivation is to detect after we have fully legalized and optimized a shuffle to the minimal number of x86 instructions whether it is profitable to replace the chain with a fully generic PSHUFB instruction even though doing so requires either a load from a constant pool or tying up a register with the mask. While the Intel manuals claim it should be used when it replaces 5 or more instructions (!!!!) my experience is that it is actually very fast on modern chips, and so I've gon with a much more aggressive model of replacing any sequence of 3 or more instructions. I've also taught it to do some basic canonicalization to special-purpose instructions which have smaller encodings than their generic counterparts. There are still quite a few FIXMEs here, and I've not yet implemented support for lowering blends with PSHUFB (where its power really shines due to being able to zero out lanes), but this starts implementing real PSHUFB support even when using the new, fancy shuffle lowering. =] llvm-svn: 214042
Diffstat (limited to 'clang/lib/Serialization/ASTWriter.cpp')
0 files changed, 0 insertions, 0 deletions
OpenPOWER on IntegriCloud