summaryrefslogtreecommitdiffstats
path: root/doc/nvlink.rst
blob: f33574a8feacb9ad67128ea79c639b5819535a93 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
.. _nvlink:

OPAL/Skiboot Nvlink Interface Documentation
===========================================

Overview
--------

NV-Link is a high speed interconnect that is used in conjunction with
a PCI-E connection to create an interface between chips that provides
very high data bandwidth. The PCI-E connection is used as the control
path to initiate and report status of large data transfers. The data
transfers themselves are sent over the NV-Link.

On IBM Power systems the NV-Link hardware is similar to our standard
PCI hardware so to maximise code reuse the NV-Link is exposed as an
emulated PCI device through system firmware (OPAL/skiboot). Thus each
NV-Link capable device will appear as two devices on a system, the
real PCI-E device and at least one emulated PCI device used for the
NV-Link.

Presently the NV-Link is only capable of data transfers initiated by
the target, thus the emulated PCI device will only handle registers
for link initialisation, DMA transfers and error reporting (EEH).

Emulated PCI Devices
--------------------

Each link will be exported as an emulated PCI device with a minimum of
two emulated PCI devices per GPU. Emulated PCI devices are grouped per
GPU.

The emulated PCI device will be exported as a standard PCI device by
the Linux kernel. It has a standard PCI configuration space to expose
necessary device parameters. The only functionality available is
related to the setup of DMA windows.

Configuration Space Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

============ =================== =====
============ =================== =====
Vendor ID    0x1014              (IBM)
Device ID    0x04ea
Revision ID  0x00
Class        0x068000 / 0x068001 (Bridge Device Other, ProgIf = 0x0 / 0x1)
BAR0/1       TL/DL Registers
BAR2/3       GEN-ID Registers    (Only for rev-id = 0x1)
============ =================== =====

TL/DL Registers
^^^^^^^^^^^^^^^

Each link has 128KB of TL/DL registers. These will always be mapped
to 64-bit BAR#0 of the emulated PCI device configuration space. ::

 BAR#0 + 128K +-----------+
       	     | NTL (64K) |
 BAR#0 + 64K  +-----------+
      	     | DL (64K)  |
 BAR#0	     +-----------+

Generation Registers (GEN-ID)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

On POWER9 each link has 64K of generation ID registers for the relaxed
ordering mode syncronisation. Refer to the programming guide for
details of the register layout in this BAR.

Relaxed ordering mode will be disabled by default as it requires
device driver support. Device drivers will need to request relaxed
ordering mode through some yet to be designed mechanism.

Vendor Specific Capabilities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::

 +-----------------+----------------+----------------+----------------+
 |  Version (0x02) |   Cap Length   |  Next Cap Ptr  |  Cap ID (0x09) |
 +-----------------+----------------+----------------+----------------+
 |                      Procedure Status Register                     |
 +--------------------------------------------------------------------+
 |                      Procedure Control Register                    |
 +---------------------------------------------------+----------------+
 |             Reserved            |   PCI Dev Flag  |   Link Number  |
 +---------------------------------------------------+----------------+

Version

   This refers to the version of the NPU config space.  Used by device
   drivers to determine which fields of the config space they can
   expect to be available.

Procedure Control Register

   Used to start hardware procedures.

   Writes will start the corresponding procedure and set bit 31 in the
   procedure status register. This register must not be written while
   bit 31 is set in the status register. Performing a write while
   another procudure is already in progress will abort that procedure.

   Reads will return the in progress procedure or the last completed
   procedure number depending on the procedure status field.

   Procedure Numbers:

    0. Abort in-progress procedure
    1. NOP
    2. Unsupported procedure
    3. Unsupported procedure
    4. Naples PHY - RESET
    5. Naples PHY - TX_ZCAL
    6. Naples PHY - RX_DCCAL
    7. Naples PHY - TX_RXCAL_ENABLE
    8. Naples PHY - TX_RXCAL_DISABLE
    9. Naples PHY - RX_TRAINING
    10. Naples NPU - RESET
    11. Naples PHY - PHY preterminate
    12. Naples PHY - PHY terminated

   Procedure 5 (TX_ZCAL) should only be run once. System firmware will
   ensure this so device drivers may call this procedure mutiple
   times.

Procedure Status Register

   The procedure status register is used to determine when execution
   of the procedure number in the control register is complete and if
   it completed successfully.

   This register must be polled frequently to allow system firmware to
   execute the procedures.

   Fields:
       Bit 31 - Procedure in progress
       Bit 30 - Procedure complete
       Bit 3-0 - Procedure completion code

   Procedure completion codes:
       0 - Procedure completed successfully.
       1 - Transient failure. Procedure should be rerun.
       2 - Permanent failure. Procedure will never complete successfully.
       3 - Procedure aborted.
       4 - Unsupported procedure.

PCI Device Flag

   Bit 0 - set if the GPU PCIe device associated with this nvlink was found.
   bit 1 - set if the DL has been taken out of reset.

Link Number

   Physical link number this emulated PCI device is associated
   with. One of 0, 1, 4 or 5 (links 2 & 3 do not exist on Naples).

Reserved

   These fields must be ignored and no value should be assumed.

Interrupts
^^^^^^^^^^

Each link has a single DL/TL interrupt assigned to it. These will be
exposed as an LSI via the emulated PCI device. There are 4 links
consuming 4 LSI interrupts. The 4 remaining interrupts supported by the
corresponding PHB will be routed to OS platform for the purpose of error
reporting.

Device Tree Bindings
--------------------

See :ref:`device-tree/nvlink`
OpenPOWER on IntegriCloud