diff options
| -rw-r--r-- | designs/thermal_control_modes.md | 78 |
1 files changed, 78 insertions, 0 deletions
diff --git a/designs/thermal_control_modes.md b/designs/thermal_control_modes.md new file mode 100644 index 0000000..d91f892 --- /dev/null +++ b/designs/thermal_control_modes.md @@ -0,0 +1,78 @@ +# Control.ThermalMode dbus interface with Supported and Current properties + +Author: + Matthew Barth !msbarth +Primary assignee: + Matthew Barth !msbarth +Other contributors: + None +Created: + 2019-02-06 + +## Problem Description +An issue was discovered where the exhaust heat from the system GPUs causes +overtemp warnings on optical cables on certain system configurations. The +issue can be resolved by altering the fan control application's floor table, +effectively raising the floor when these optical cables exist but an interface +is needed to do so. Since the issue revolves around the optical cables +themselves, where no current mechanism exists to detect the presence of the +optical cables plugged into a card downwind from the GPUs' exhaust, +an end-user must be presented with an ability to enable this raised floor +speed table. + +## Background and References +The witherspoon system supports pci cards that could have optical cables +plugged in place of copper cables. These optical cables can report overtemp +warnings to the OS when high GPU utilization workloads exist. When this occurs +with low enough CPU utilization, the fans could be kept at a given floor speed +that sufficiently cools the components within the chassis, but not the optical +cables with the slow moving hot exhaust. + +Without an available exhaust temp sensor, there's no direct way to determine +the exhaust temp and include that within the fan control algorithm. A similar +issue exists on other system where mathematical calculations are done based on +the overall power dissipation. + +Mathematical calculations to logically estimate exit air temps: +https://github.com/openbmc/dbus-sensors/blob/master/src/ExitAirTempSensor.cpp + +## Requirements +Create the ability for an end-user to enable the use of a thermal control mode +other than the default. In this use-case, the mode is specific to an +undetectable configuration that alters the fan floor speeds unrelated to +standardized profile/modes such "Acoustic" and "Performance". Once the end-user +selects a documented mode for the platform, the thermal control application +alters its control algorithm according to the defined mode, which is +implementation specific to that instance of the application on that platform. + +## Proposed Design +Create a Control.ThermalMode dbus interface containing a supported list of +available thermal control modes along with what current mode is in use. +Initially the current mode would be set to "Default" and the implementation +of the interface would populate the supported list of modes. + +As one implementation, phosphor-fan-presence/control would be updated to extend +this dbus interface object which would fill in the list of supported modes +from its fan control configuration for the platform. Once the fan control +application starts, the interface would be added on the zone object and +available to be queried for supported modes or update the current mode. +An end-user may set the current mode to any of those supported modes and the +current mode would be persisted each time it is updated. This is to ensure +each time the fan control application zone objects are started, the last set +control mode is used. + +## Alternatives Considered +Mathematical calculation to create a virtual exhaust temp sensor value based +on overall power dissipation. However, in the witherspoon situation, using +this technique would not be reliable in adjusting the floor speeds for only +configurations using optical cables. This would instead present the possibility +of raising floor speeds for configurations where its unnecessary. + +## Impacts +The thermal control application used must be configured to provide what thermal +control modes are supported/available on the interface as well as perform the +associated control changes when a mode is set. + +## Testing +Trigger the use of an alternative fan floor table based on the thermal control +mode selected on a witherspoon system. |

