summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
diff options
context:
space:
mode:
authorCraig Topper <craig.topper@intel.com>2018-03-31 04:54:32 +0000
committerCraig Topper <craig.topper@intel.com>2018-03-31 04:54:32 +0000
commit13a0f83a05ff46341b722d9e6fabe3f32443a3e1 (patch)
tree82eb11f2d69f927cabb2fd5d38dea9111cbfa2e2 /llvm/lib/Transforms
parent96871864d2433f98b643a687b8981beba19d3bc3 (diff)
downloadbcm5719-llvm-13a0f83a05ff46341b722d9e6fabe3f32443a3e1.tar.gz
bcm5719-llvm-13a0f83a05ff46341b722d9e6fabe3f32443a3e1.zip
[X86] Add SchedRW for PMULLD
Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914
Diffstat (limited to 'llvm/lib/Transforms')
0 files changed, 0 insertions, 0 deletions
OpenPOWER on IntegriCloud