diff options
| author | Craig Topper <craig.topper@intel.com> | 2018-03-31 04:54:32 +0000 |
|---|---|---|
| committer | Craig Topper <craig.topper@intel.com> | 2018-03-31 04:54:32 +0000 |
| commit | 13a0f83a05ff46341b722d9e6fabe3f32443a3e1 (patch) | |
| tree | 82eb11f2d69f927cabb2fd5d38dea9111cbfa2e2 /llvm/lib/Demangle/ItaniumDemangle.cpp | |
| parent | 96871864d2433f98b643a687b8981beba19d3bc3 (diff) | |
| download | bcm5719-llvm-13a0f83a05ff46341b722d9e6fabe3f32443a3e1.tar.gz bcm5719-llvm-13a0f83a05ff46341b722d9e6fabe3f32443a3e1.zip | |
[X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
Diffstat (limited to 'llvm/lib/Demangle/ItaniumDemangle.cpp')
0 files changed, 0 insertions, 0 deletions

