[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.

Summary: First off: i do not have any access to that processor, so this is purely theoretical, no benchmarks. I have been looking into b**d**ver2 scheduling profile, and while cross-referencing the existing b**t**ver2, znver1 profiles, and the reference docs (`Software Optimization Guide for AMD Family {15,16,17}h Processors`), i have noticed that only b**t**ver2 scheduling profile specifies these. Also, there is no mca test coverage. Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb Reviewed By: GGanesh Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D47676 llvm-svn: 335099
author: Roman Lebedev <lebedev.ri@gmail.com> 2018-06-20 07:01:14 +0000
committer: Roman Lebedev <lebedev.ri@gmail.com> 2018-06-20 07:01:14 +0000
commit: d23b6831deb70a7a3cce0d35b25dcfeaad206127 (patch)
tree: a2598743b2898591591597582210ca320e08e05f /llvm
parent: e0aa30008fdaad02bf4ff1871e3a56b51bf33489 (diff)
download: bcm5719-llvm-d23b6831deb70a7a3cce0d35b25dcfeaad206127.tar.gz
bcm5719-llvm-d23b6831deb70a7a3cce0d35b25dcfeaad206127.zip
3 files changed, 37 insertions, 1 deletions
diff --git a/llvm/lib/Target/X86/X86ScheduleZnver1.td b/llvm/lib/Target/X86/X86ScheduleZnver1.td
index ec7c861efa6..cf5fe652163 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver1.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver1.td
@@ -55,7 +55,6 @@ def ZnFPU2 : ProcResource<1>;
 def ZnFPU3 : ProcResource<1>;
 
 // FPU grouping
-def ZnFPU     : ProcResGroup<[ZnFPU0, ZnFPU1, ZnFPU2, ZnFPU3]>;
 def ZnFPU013  : ProcResGroup<[ZnFPU0, ZnFPU1, ZnFPU3]>;
 def ZnFPU01   : ProcResGroup<[ZnFPU0, ZnFPU1]>;
 def ZnFPU12   : ProcResGroup<[ZnFPU1, ZnFPU2]>;
@@ -91,6 +90,32 @@ def ZnDivider : ProcResource<1>;
 // 4 Cycles load-to use Latency is captured
 def : ReadAdvance<ReadAfterLd, 4>;
 
+// The Integer PRF for Zen is 168 entries, and it holds the architectural and
+// speculative version of the 64-bit integer registers.
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+def ZnIntegerPRF : RegisterFile<168, [GR8, GR16, GR32, GR64, CCR]>;
+
+// 36 Entry (9x4 entries) floating-point Scheduler
+def ZnFPU     : ProcResGroup<[ZnFPU0, ZnFPU1, ZnFPU2, ZnFPU3]> {
+let BufferSize=36;
+}
+
+// The Zen FP Retire Queue renames SIMD and FP uOps onto a pool of 160 128-bit
+// registers. Operations on 256-bit data types are cracked into two COPs.
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+def ZnFpuPRF: RegisterFile<160, [VR64, VR128, VR256], [1, 1, 2]>;
+
+// The unit can track up to 192 macro ops in-flight.
+// The retire unit handles in-order commit of up to 8 macro ops per cycle.
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+// To be noted, the retire unit is shared between integer and FP ops.
+// In SMT mode it is 96 entry per thread. But, we do not use the conservative
+// value here because there is currently no way to fully mode the SMT mode,
+// so there is no point in trying.
+def ZnRCU : RetireControlUnit<192, 8>;
+
+// FIXME: there are 72 read buffers and 44 write buffers.
+
 // (a folded load is an instruction that loads and does some operation)
 // Ex: ADDPD xmm,[mem]-> This instruction has two micro-ops
 // Instructions with folded loads are usually micro-fused, so they only appear
diff --git a/llvm/test/tools/llvm-mca/X86/register-file-statistics.s b/llvm/test/tools/llvm-mca/X86/register-file-statistics.s
index c35edf5af55..858da0e3a4f 100644
--- a/llvm/test/tools/llvm-mca/X86/register-file-statistics.s
+++ b/llvm/test/tools/llvm-mca/X86/register-file-statistics.s
@@ -21,7 +21,17 @@ xor %eax, %ebx
 # BTVER2-NEXT:        Total number of mappings created: 0
 # BTVER2-NEXT:        Max number of mappings used:      0
 
+# ZNVER1:          *  Register File #1 -- ZnFpuPRF:
+# ZNVER1-NEXT:        Number of physical registers:     160
+# ZNVER1-NEXT:        Total number of mappings created: 0
+# ZNVER1-NEXT:        Max number of mappings used:      0
+
 # BTVER2:          *  Register File #2 -- JIntegerPRF:
 # BTVER2-NEXT:        Number of physical registers:     64
 # BTVER2-NEXT:        Total number of mappings created: 2
 # BTVER2-NEXT:        Max number of mappings used:      2
+
+# ZNVER1:          *  Register File #2 -- ZnIntegerPRF:
+# ZNVER1-NEXT:        Number of physical registers:     168
+# ZNVER1-NEXT:        Total number of mappings created: 2
+# ZNVER1-NEXT:        Max number of mappings used:      2
diff --git a/llvm/test/tools/llvm-mca/X86/scheduler-queue-usage.s b/llvm/test/tools/llvm-mca/X86/scheduler-queue-usage.s
index 4cf9e6a0230..8448960c67e 100644
--- a/llvm/test/tools/llvm-mca/X86/scheduler-queue-usage.s
+++ b/llvm/test/tools/llvm-mca/X86/scheduler-queue-usage.s
@@ -49,3 +49,4 @@ xor %eax, %ebx
 # ZNVER1:          Scheduler's queue usage:
 # ZNVER1-NEXT:     ZnAGU,  0/28
 # ZNVER1-NEXT:     ZnALU,  1/56
+# ZNVER1-NEXT:     ZnFPU,  0/36
author	Roman Lebedev <lebedev.ri@gmail.com>	2018-06-20 07:01:14 +0000
committer	Roman Lebedev <lebedev.ri@gmail.com>	2018-06-20 07:01:14 +0000
commit	d23b6831deb70a7a3cce0d35b25dcfeaad206127 (patch)
tree	a2598743b2898591591597582210ca320e08e05f /llvm
parent	e0aa30008fdaad02bf4ff1871e3a56b51bf33489 (diff)
download	bcm5719-llvm-d23b6831deb70a7a3cce0d35b25dcfeaad206127.tar.gz bcm5719-llvm-d23b6831deb70a7a3cce0d35b25dcfeaad206127.zip