From a6837b5a758cec765237488e6823ed88844c8ac5 Mon Sep 17 00:00:00 2001 From: Jeremy Kerr Date: Fri, 8 Feb 2019 14:29:34 +0800 Subject: designs: Add MCTP design document Add initial design document, as posted on the openbmc list, and incorporating feedback from: - Emily Shaffer - Nilan Naidoo - Supreeth Venkatesh - Ed Tanous Change-Id: I9731b2ca9752bdf826e0e98486cfbc3393336043 Signed-off-by: Jeremy Kerr --- designs/mctp-standards.svg | 536 +++++++++++++++++++++++++++++++++++++++++++++ designs/mctp.md | 216 ++++++++++++++++++ 2 files changed, 752 insertions(+) create mode 100644 designs/mctp-standards.svg create mode 100644 designs/mctp.md diff --git a/designs/mctp-standards.svg b/designs/mctp-standards.svg new file mode 100644 index 0000000..cf5f339 --- /dev/null +++ b/designs/mctp-standards.svg @@ -0,0 +1,536 @@ + + + + + + + + + + image/svg+xml + + + + + + + + LPC + PLDM profiles + + PCIe VDM + + MCTP BaseDSP0236 + + PLDM BaseDSP0240 + + Redfish(PLDM variant)DSP0218[WIP] + BIOSconfiguration& controlDSP0247 + + Platformmonitoring& controlDSP0248 + + FRU DataDSP0257 + + + + FW UpdateDSP0267 + + + PLDM over MCTPbindingDSP0241 + + + + + MCTP PCIe VDMtransport bindingDSP0238 + + + + + + MCTP Serialtransport bindingDSP0253 + + + + MCTP LPCtransport binding?unspec + + diff --git a/designs/mctp.md b/designs/mctp.md new file mode 100644 index 0000000..6985cef --- /dev/null +++ b/designs/mctp.md @@ -0,0 +1,216 @@ +# OpenBMC platform communication channel: MCTP & PLDM + +Author: Jeremy Kerr + +## Problem Description + +Currently, we have a few different methods of communication between host +and BMC. This is primarily IPMI-based, but also includes a few +hardware-specific side-channels, like hiomap. On OpenPOWER hardware at +least, we've definitely started to hit some of the limitations of IPMI +(for example, we have need for >255 sensors), as well as the hardware +channels that IPMI typically uses. + +This design aims to use the Management Component Transport Protocol +(MCTP) to provide a common transport layer over the multiple channels +that OpenBMC platforms provide. Then, on top of MCTP, we have the +opportunity to move to newer host/BMC messaging protocols to overcome +some of the limitations we've encountered with IPMI. + +## Background and References + +Separating the "transport" and "messaging protocol" parts of the current +stack allows us to design these parts separately. Currently, IPMI +defines both of these; we currently have BT and KCS (both defined as +part of the IPMI 2.0 standard) as the transports, and IPMI itself as the +messaging protocol. + +Some efforts of improving the hardware transport mechanism of IPMI have +been attempted, but not in a cross-implementation manner so far. This +does not address some of the limitations of the IPMI data model. + +MCTP defines a standard transport protocol, plus a number of separate +physical layer bindings for the actual transport of MCTP packets. These +are defined by the DMTF's Platform Management Working group; standards +are available at: + + https://www.dmtf.org/standards/pmci + +The following diagram shows how these standards map to the areas of +functionality that we may want to implement for OpenBMC. The DSP numbers +provided are references to DMTF standard documents. + +![](mctp-standards.svg) + +One of the key concepts here is that separation of transport protocol +from the physical layer bindings; this means that an MCTP "stack" may be +using either a I2C, PCI, Serial or custom hardware channel, without the +higher layers of that stack needing to be aware of the hardware +implementation. These higher levels only need to be aware that they are +communicating with a certain entity, defined by an Entity ID (MCTP EID). +These entities may be any element of the platform that communicates +over MCTP - for example, the host device, the BMC, or any other +system peripheral - static or hot-pluggable. + +This document is focussed on the "transport" part of the platform design. +While this does enable new messaging protocols (mainly PLDM), those +components are not covered in detail much; we will propose those parts +in separate design efforts. For example, the PLDM design at +[pldm-stack.md]. + +As part of the design, the references to MCTP "messages" and "packets" +are intentional, to match the definitions in the MCTP standard. MCTP +messages are the higher-level data transferred between MCTP endpoints, +which packets are typically smaller, and are what is sent over the +hardware. Messages that are larger than the hardware Maximum Transmit +Unit (MTU) are split into individual packets by the transmit +implementation, and reassembled at the receive implementation. + +## Requirements + +Any channel between host and BMC should: + + - Have a simple serialisation and deserialisation format, to enable + implementations in host firmware, which have widely varying runtime + capabilities + + - Allow different hardware channels, as we have a wide variety of + target platforms for OpenBMC + + - Be usable over simple hardware implementations, but have a facility + for higher bandwidth messaging on platforms that require it. + + - Ideally, integrate with newer messaging protocols + +## Proposed Design + +The MCTP core specification just provides the packetisation, routing and +addressing mechanisms. The actual transmit/receive of those packets is +up to the hardware binding of the MCTP transport. + +For OpenBMC, we would introduce a "MCTP+applications" daemon, which +implements the transport over a configurable hardware channel (eg., +Serial UART, I2C or PCI), and provides handlers for any incoming MCTP +application requests. This daemon is responsible for the packetisation +and routing of MCTP messages from external endpoints, and handling the +application layer requests. + +This daemon has a few components: + + 1) the core MCTP stack + + 2) one or more binding implementations (eg, MCTP-over-serial), which + interact with the hardware channel(s). + + 3) one or more MCTP message handlers (eg PLDM or NVME-MI), to handle incoming + MCTP messages of specific types + + 4) the core application, consisting of main loop, handler management and + MCTP binding management + +The proposed implementation here is to produce an MCTP "library" which +provides the packetisation and routing functions, between: + + - an "upper" messaging transmit/receive interface, for tx/rx of a full + message to a specific endpoint (ie, (1) above) + + - a "lower" hardware binding for transmit/receive of individual + packets, providing a method for the core to tx/rx each packet to + hardware, and defines the parameters of the common packetisation + code (ie. (2) above). + +The lower interface would be plugged in to one of a number of +hardware-specific binding implementations. Most of these would be +included in the library source tree, but others can be plugged-in too, +perhaps where the physical layer implementation does not make sense to +include in the platform-agnostic library. + +The reason for a library is to allow the same MCTP implementation to be +used in both OpenBMC and host firmware; the library should be +bidirectional. To allow this, the library would be written in portable C +(structured in a way that can be compiled as "extern C" in C++ +codebases), and be able to be configured to suit those runtime +environments (for example, POSIX IO may not be available on all +platforms; we should be able to compile the library to suit). The +licence for the library should also allow this re-use; a dual Apache & +GPLv2+ licence may be best. + +These "lower" binding implementations may have very different methods of +transferring packets to the physical layer. For example, a serial +binding implementation for running on a Linux environment may be +implemented through read()/write() syscalls to a PTY device. An I2C +binding for use in low-level host firmware environments may interact +directly with hardware registers to perform packet transfers. + +The application-specific handlers (listed as (3) above) implement the +actual functionality provided over the MCTP channel. Each of these would +register with the MCTP core library to receive MCTP messages of a +certain type, and would transmit MCTP messages of that same type. While +the handlers themselves are out of scope for this design, there are a +few elements that are important here: + + - Handlers are likely to perform IO to other components of the BMC + (such as sending and receiving dbus messages). To allow multiple + handlers to co-exist, this IO should be implemented using + non-blocking interfaces (eg, using poll()). + + - Handlers should be implemented as separate components from the main + daemon, so as not to require completely separate functionality (such + as PLDM and NVME-MI) existing in the same codebase. Having the core + daemon load handlers as shared objects would allow this. + +MCTP is intended to be an optional component of OpenBMC. Platforms using +OpenBMC are free to adopt it as they see fit. + +## Alternatives Considered + +There have been two main alternatives to this approach: + +Continue using IPMI, but start making more use of OEM extensions to +suit the requirements of new platforms. However, given that the IPMI +standard is no longer under active development, we would likely end up +with a large amount of platform-specific customisations. This also does +not solve the hardware channel issues in a standard manner. + +Redfish between host and BMC. This would mean that host firmware needs a +HTTP client, a TCP/IP stack, a JSON (de)serialiser, and support for +Redfish schema. While this may be present in some environments (for +example, UEFI-based firmware), this is may not be feasible for all host +firmware implementations (for example, OpenPOWER). It's possible that we +could run a simplified Redfish stack - indeed, MCTP has a proposal for a +Redfish-over-MCTP channel (DSP0218), which uses simplified serialisation +format and no requirement on HTTP. However, this may involve a large +amount of complexity in host firmware. + +In terms of an MCTP daemon implementation, an alternative is to have the +core MCTP stack exist in a different process from the application +handlers. For example, the MCTP core could be only responsible for +proxying MCTP messages to and from a dbus interface, as is currently +done for IPMI messages. However, the complexity, messaging overheads and +state management involved here has indicated that the added separation +has not been a clear advantage. + +## Impacts + +Development would be required to implement the MCTP transport, plus any +new users of the MCTP messaging (eg, a PLDM implementation). These would +somewhat duplicate the work we have in IPMI handlers. + +We'd want to keep IPMI running in parallel, so the "upgrade" path should +be fairly straightforward. + +Design and development needs to involve potential host, management +controllers and managed device implementations. + +## Testing + +For the core MCTP library, we are able to run tests there in complete +isolation (I have already been able to run a prototype MCTP stack +through the afl fuzzer) to ensure that the core transport protocol +works. + +For MCTP hardware bindings, we would develop channel-specific tests that +would be run in CI on both host and BMC. + +For the OpenBMC MCTP daemon implementation, testing models would depend +on the structure we adopt in the design section. -- cgit v1.2.1