skiboot 6.0 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
author: Stewart Smith <stewart@linux.ibm.com> 2018-05-11 16:37:07 -0500
committer: Stewart Smith <stewart@linux.ibm.com> 2018-05-11 17:17:13 -0500
commit: 7519644443707e972fed836853dd92031efcb858 (patch)
tree: a6e0131a23f0a12e81b3b426151223c82a438248 /doc
parent: 05d9d981df9c96c967882699574ff2cc4ae13f8b (diff)
download: blackbird-skiboot-7519644443707e972fed836853dd92031efcb858.tar.gz
blackbird-skiboot-7519644443707e972fed836853dd92031efcb858.zip
1 files changed, 1027 insertions, 0 deletions
diff --git a/doc/release-notes/skiboot-6.0.rst b/doc/release-notes/skiboot-6.0.rst
new file mode 100644
index 00000000..30973455
--- /dev/null
+++ b/doc/release-notes/skiboot-6.0.rst
@@ -0,0 +1,1027 @@
+.. _skiboot-6.0:
+
+skiboot-6.0
+===========
+
+skiboot v6.0 was released on Friday May 11th 2018. It is the first
+release of skiboot 6.0, which is the new stable release of skiboot
+following the 5.11 release, first released April 6th 2018.
+
+Skiboot 6.0 is the basis for op-build v2.0 and will is *required* for
+POWER9 systems.
+
+skiboot v6.0 contains all bug fixes as of :ref:`skiboot-5.11`,
+:ref:`skiboot-5.10.5`, and :ref:`skiboot-5.4.9` (the currently maintained
+stable releases). We do *not* expect any further stable releases in the
+5.10.x series, nor in the 5.11.x series.
+
+For how the skiboot stable releases work, see :ref:`stable-rules` for details.
+
+Over skiboot-5.11, we have the following changes:
+
+
+New Features
+------------
+
+Since 6.0-rc1:
+
+- Update default stop-state-disable mask to cut only stop11
+
+  Stability improvements in microcode for stop4/stop5 are
+  available in upstream hcode images. Stop4 and stop5 can
+  be safely enabled by default.
+
+  Use ~0xE0000000 to cut all but stop0,1,2 in case there
+  are any issues with stop4/5.
+
+  example: ::
+
+    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF
+
+  **Note**: that DD2.1 chips that have a frequency <1867Mhz possible *need* to
+  run a hcode image *different* than the default in op-build (set
+  `BR2_HCODE_LATEST_VERSION=y` in your config)
+- ibm,firmware-versions: add hcode to device tree
+
+  op-build commit 736a08b996e292a449c4996edb264011dfe56a40
+  added hcode to the VERSION partition, let's parse it out
+  and let the user know.
+- ipmi: Add BMC firmware version to device tree
+
+  BMC Get device ID command gives BMC firmware version details. Lets add this
+  to device tree. User space tools will use this information to display BMC
+  version details.
+
+Since 5.11:
+
+- Disable stop states from OPAL
+
+  On ZZ, stop4,5,11 are enabled for PowerVM, even though doing
+  so may cause problems with OPAL due to bugs in hcode.
+
+  For other platforms, this isn't so much of an issue as
+  we can just control stop states by the MRW. However the
+  rebuild-the-world approach to changing values there is a bit
+  annoying if you just want to rule out a specific stop state
+  from being problematic.
+
+  Provide an nvram option to override what's disabled in OPAL.
+
+  The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)
+
+  You can set an NVRAM override with: ::
+
+      nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF
+
+  This nvram override will disable *all* stop states.
+- interrupts: Create an "interrupts" property in the OPAL node
+
+  Deprecate the old "opal-interrupts", it's still there, but the new
+  property follows the standard and allow us to specify whether an
+  interrupt is level or edge sensitive.
+
+  Similarly create "interrupt-names" whose content is identical to
+  "opal-interrupts-names".
+- SBE: Add timer support on POWER9
+
+  SBE on P9 provides one shot programmable timer facility. We can use this
+  to implement OPAL timers and hence limit the reliance on the Linux
+  heartbeat (similar to HW timer facility provided by SLW on P8).
+- Add SBE driver support
+
+  SBE (Self Boot Engine) on P9 has two different jobs:
+  - Boot the chip up to the point the core is functional
+  - Provide various services like timer, scom, stash MPIPL, etc., at runtime
+
+  We will use SBE for various purposes like timer, MPIPL, etc.
+
+- opal:hmi: Add missing processor recovery reason string.
+
+  With this patch now we see reason string printed for CORE_WOF[43] bit. ::
+
+    [  477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
+    [  477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
+    [  477.352242181,7] HMI: PC - Thread hang recovery
+- Add DIMM actual speed to device tree
+
+  Recent HDAT provides DIMM actuall speed. Lets add this to device tree.
+- Fix DIMM size property
+
+  Today we parse vpd blob to get DIMM size information. This is limited
+  to FSP based system. HDAT provides DIMM size value. Lets use that to
+  populate device tree. So that we can get size information on BMC based
+  system as well.
+
+- PCI: Set slot power limit when supported
+
+  The PCIe slot capability can be implemented in a root or switch
+  downstream port to set the maximum power a card is allowed to draw
+  from the system. This patch adds support for setting the power limit
+  when the platform has defined one.
+- hdata/spira: parse vpd to add part-number and serial-number to xscom@ node
+
+  Expected by FWTS and associates our processor with the part/serial
+  number, which is obviously a good thing for one's own sanity.
+
+
+Improved HMI Handling
+^^^^^^^^^^^^^^^^^^^^^
+
+- opal/hmi: Add documentation for opal_handle_hmi2 call
+- opal/hmi: Generate hmi event for recovered HDEC parity error.
+- opal/hmi: check thread 0 tfmr to validate latched tfmr errors.
+
+  Due to P9 errata, HDEC parity and TB residue errors are latched for
+  non-zero threads 1-3 even if they are cleared. But these are not
+  latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
+  value and ignore them on non-zero threads if they are not present on
+  thread 0.
+- opal/hmi: Print additional debug information in rendezvous.
+- opal/hmi: Fix handling of TFMR parity/corrupt error.
+
+  While testing TFMR parity/corrupt error it has been observed that HMIs are
+  delivered twice for this error
+
+    - First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
+    - Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.
+
+  On second HMI we end up throwing "HMI: TB invalid without core error
+  reported" even though TB is in a valid state.
+- opal/hmi: Stop flooding HMI event for TOD errors.
+
+  Fix the issue where every thread on the chip sends HMI event to host for
+  TOD errors. TOD errors are reported to all the core/threads on the chip.
+  Any one thread can fix the error and send event. Rest of the threads don't
+  need to send HMI event unnecessarily.
+- opal/hmi: Fix soft lockups during TOD errors
+
+  There are some TOD errors which do not affect working of TOD and TB. They
+  stay in valid state. Hence we don't need rendez vous for TOD errors that
+  does not affect TB working.
+
+  TOD errors that affects TOD/TB will report a global error on TFMR[44]
+  alongwith bit 51, and they will go in rendez vous path as expected.
+
+  But the TOD errors that does not affect TB register sets only TFMR bit 51.
+  The TFMR bit 51 is cleared when any single thread clears the TOD error.
+  Once cleared, the bit 51 is reflected to all the cores on that chip. Any
+  thread that reads the TFMR register after the error is cleared will see
+  TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
+  rendez-vous path and threads that see TFMR[51]=0, returns doing
+  nothing. This ends up in a soft lockups in host kernel.
+
+  This patch fixes this issue by not considering TOD interrupt (TFMR[51])
+  as a core-global error and hence avoiding rendez-vous path completely.
+  Instead threads that see TFMR[51]=1 will now take different path that
+  just do the TOD error recovery.
+- opal/hmi: Do not send HMI event if no errors are found.
+
+  For TOD errors, all the cores in the chip get HMIs. Any one thread from any
+  core can fix the issue and TFMR will have error conditions cleared. Rest of
+  the threads need take any action if TOD errors are already cleared. Hence
+  thread 0 of every core should get a fresh copy of TFMR before going ahead
+  recovery path. Initialize recover = -1, so that if no errors found that
+  thread need not send a HMI event to linux. This helps in stop flooding host
+  with hmi event by every thread even there are no errors found.
+- opal/hmi: Initialize the hmi event with old value of HMER.
+
+  Do this before we check for TFAC errors. Otherwise the event at host console
+  shows no error reported in HMER register.
+
+  Without this patch the console event show HMER with all zeros ::
+
+    [  216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
+    [  216.753498]  Error detail: Timer facility experienced an error
+    [  216.753509]  HMER: 0000000000000000
+    [  216.753518]  TFMR: 3c12000870e04000
+
+  After this patch it shows old HMER values on host console: ::
+
+    [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
+    [ 2237.652651]  Error detail: Timer facility experienced an error
+    [ 2237.652766]  HMER: 0840000000000000
+    [ 2237.652837]  TFMR: 3c12000870e04000
+- opal/hmi: Rework HMI handling of TFAC errors
+
+  This patch reworks the HMI handling for TFAC errors by introducing
+  4 rendez-vous points improve the thread synchronization while handling
+  timebase errors that requires all thread to clear dirty data from TB/HDEC
+  register before clearing the errors.
+- opal/hmi: Don't bother passing HMER to pre-recovery cleanup
+
+  The test for TFAC error is now redundant so we remove it and
+  remove the HMER argument.
+- opal/hmi: Move timer related error handling to a separate function
+
+  Currently no functional change. This is a first step to completely
+  rewriting how these things are handled.
+- opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux
+
+  It returns a 64-bit flags mask currently set to provide info
+  about which timer facilities were lost, and whether an event
+  was generated.
+- opal/hmi: Remove races in clearing HMER
+
+  Writing to HMER acts as an "AND". The current code writes back the
+  value we originally read with the bits we handled cleared. This is
+  racy, if a new bit gets set in HW after the original read, we'll end
+  up clearing it without handling it.
+
+  Instead, use an all 1's mask with only the bit handled cleared.
+- opal/hmi: Don't re-read HMER multiple times
+
+  We want to make sure all reporting and actions are based
+  upon the same snapshot of HMER in case bits get added
+  by HW while we are in OPAL.
+
+libflash and ffspart
+^^^^^^^^^^^^^^^^^^^^
+
+Many improvements to the `ffspart` utility and `libflash` have come
+in this release, making `ffspart` suitable for building bit-identical
+PNOR images as the existing tooling used by `op-build`. The plan is to
+switch `op-build` to use this infrastructure in the not too distant
+future.
+
+- libflash/blocklevel: Make read/write be ECC agnostic for callers
+
+  The blocklevel abstraction allows for regions of the backing store to be
+  marked as ECC protected so that blocklevel can decode/encode the ECC
+  bytes into the buffer automatically without the caller having to be ECC
+  aware.
+
+  Unfortunately this abstraction is far from perfect, this is only useful
+  if reads and writes are performed at the start of the ECC region or in
+  some circumstances at an ECC aligned position - which requires the
+  caller be aware of the ECC regions.
+
+  The problem that has arisen is that the blocklevel abstraction is
+  initialised somewhere but when it is later called the caller is unaware
+  if ECC exists in the region it wants to arbitrarily read and write to.
+  This should not have been a problem since blocklevel knows. Currently
+  misaligned reads will fail ECC checks and misaligned writes will
+  overwrite ECC bytes and the backing store will become corrupted.
+
+  This patch add the smarts to blocklevel_read() and blocklevel_write() to
+  cope with the problem. Note that ECC can always be bypassed by calling
+  blocklevel_raw_() functions.
+
+  All this work means that the gard tool can can safely call
+  blocklevel_read() and blocklevel_write() and as long as the blocklevel
+  knows of the presence of ECC then it will deal with all cases.
+
+  This also commit removes code in the gard tool which compensated for
+  inadequacies no longer present in blocklevel.
+- libflash/blocklevel: Return region start from ecc_protected()
+
+  Currently all ecc_protected() does is say if a region is ECC protected
+  or not. Knowing a region is ECC protected is one thing but there isn't
+  much that can be done afterwards if this is the only known fact. A lot
+  more can be done if the caller is told where the ECC region begins.
+
+  Knowing where the ECC region start it allows to caller to align its
+  read/and writes. This allows for more flexibility calling read and write
+  without knowing exactly how the backing store is organised.
+- libflash/ecc: Add helpers to align a position within an ecc buffer
+
+  As part of ongoing work to make ECC invisible to higher levels up the
+  stack this function converts a 'position' which should be ECC agnostic
+  to the equivalent position within an ECC region starting at a specified
+  location.
+- libflash/ecc: Add functions to deal with unaligned ECC memcpy
+- external/ffspart: Improve error output
+- libffs: Fix bad checks for partition overlap
+
+  Not all TOCs are written at zero
+- libflash/libffs: Allow caller to specifiy header partition
+
+  An FFS TOC is comprised of two parts. A small header which has a magic
+  and very minimmal information about the TOC which will be common to all
+  partitions, things like number of patritions, block sizes and the like.
+  Following this small header are a series of entries. Importantly there
+  is always an entry which encompases the TOC its self, this is usually
+  called the 'part' partition.
+
+  Currently libffs always assumes that the 'part' partition is at zero.
+  While there is always a TOC and zero there doesn't actually have to be.
+  PNORs may have multiple TOCs within them, therefore libffs needs to be
+  flexible enough to allow callers to specify TOCs not at zero.
+
+  The 'part' partition is otherwise a regular partition which may have
+  flags associated with it. libffs should allow the user to set the flags
+  for the 'part' partition.
+
+  This patch achieves both by allowing the caller to specify the 'part'
+  partition. The caller can not and libffs will provide a sensible
+  default.
+- libflash/libffs: Refcount ffs entries
+
+  Currently consumers can add an new ffs entry to multiple headers, this
+  is fine but freeing any of the headers will cause the entry to be freed,
+  this causes double free problems.
+
+  Even if only one header is uses, the consumer of the library still has a
+  reference to the entry, which they may well reuse at some other point.
+
+  libffs will now refcount entries and only free when there are no more
+  references.
+
+  This patch also removes the pointless return value of ffs_hdr_free()
+- libflash/libffs: Switch to storing header entries in an array
+
+  Since the libffs no longer needs to sort the entries as they get added
+  it makes little sense to have the complexity of a linked list when an
+  array will suffice.
+- libflash/libffs: Remove backup partition from TOC generation code
+
+  It turns out this code was messy and not all that reliable. Doing it at
+  the library level adds complexity to the library and restrictions to the
+  caller.
+
+  A simpler approach can be achived with the just instantiating multiple
+  ffs_header structures pointing to different parts of the same file.
+- libflash/libffs: Remove the 'sides' from the FFS TOC generation code
+
+  It turns out this code was messy and not all that reliable. Doing it at
+  the library level adds complexity to the library and restrictions to the
+  caller.
+
+  A simpler approach can be achived with the just instantiating multiple
+  ffs_header structures pointing to different parts of the same file.
+- libflash/libffs: Always add entries to the end of the TOC
+
+  It turns out that sorted order isn't the best idea. This removes
+  flexibility from the caller. If the user wants their partitions in
+  sorted order, they should insert them in sorted order.
+- external/ffspart: Remove side, order and backup options
+
+  These options are currently flakey in libflash/libffs so there isn't
+  much point to being able to use them in ffspart.
+
+  Future reworks planned for libflash/libffs will render these options
+  redundant anyway.
+- libflash/libffs: ffs_close() should use ffs_hdr_free()
+- libflash/libffs: Add setter for a partitions actual size
+- pflash: Use ffs_entry_user_to_string() to standardise flag strings
+- libffs: Standardise ffs partition flags
+
+  It seems we've developed a character respresentation for ffs partition
+  flags. Currently only pflash really prints them so it hasn't been a
+  problem but now ffspart wants to read them in from user input.
+
+  It is important that what libffs reads and what pflash prints remain
+  consistent, we should move the code into libffs to avoid problems.
+- external/ffspart: Allow # comments in input file\
+
+p9dsu Platform changes
+----------------------
+
+The p9dsu platform from SuperMicro (also known as 'Boston') has received
+a number of updates, and the patches once carried by SuperMicro are now
+upstream.
+
+Since 6.0-rc1:
+
+- p9dsu: timeout for variant detection, default to 2uess
+
+
+Since 5.11:
+
+- p9dsu: detect p9dsu variant even when hostboot doesn't tell us
+
+  The SuperMicro BMC can tell us what riser type we have, which dictates
+  the PCI slot tables. Usually, in an environment that a customer would
+  experience, Hostboot will do the query with an SMC specific patch
+  (not upstream as there's no platform specific code in hostboot)
+  and skiboot knows what variant it is based on the compatible string.
+
+  However, if you're using upstream hostboot, you only get the bare
+  'p9dsu' compatible type. We can work around this by asking the BMC
+  ourselves and setting the slot table appropriately. We do this
+  syncronously in platform init so that we don't start probing
+  PCI before we setup the slot table.
+- p9dsu: add slot power limit.
+- p9dsu: add pci slot table for Boston LC 1U/2U and Boston LA/ESS.
+- p9dsu HACK: fix system-vpd eeprom
+- p9dsu: change esel command from AMI to IBM 0x3a.
+
+ZZ Platform Changes
+-------------------
+
+- hdata/i2c: Fix up pci hotplug labels
+
+  These labels are used on the devices used to do PCIe slot power control
+  for implementing PCIe hotplug. I'm not sure how they ended up as
+  "eeprom-pgood" and "eeprom-controller" since that doesn't make any sense.
+- hdata/i2c: Ignore multi-port I2C devices
+
+  Recent FSP firmware builds add support for multi-port I2C devices such
+  as the GPIO expanders used for the presence detect of OpenCAPI devices
+  and the PCIe hotplug controllers used to power cycle PCIe slots on ZZ.
+
+  The OpenCAPI driver inside of skiboot currently uses a platform-specific
+  method to talk to the relevant I2C device rather than relying on HDAT
+  since not all platforms correctly report the I2C devices (hello Zaius).
+  Additionally the nature of multi-port devices require that we a device
+  specific handler so that we generate the correct DT bindings. Currently
+  we don't and there is no immediate need for this support so just ignore
+  the multi-port devices for now.
+- hdata/i2c: Replace `i2c_` prefix with `dev_`
+
+  The current naming scheme makes it easy to conflate "i2cm_port" and
+  "i2c_port." The latter is used to describe multi-port I2C devices such
+  as GPIO expanders and multi-channel PCIe hotplug controllers. Rename
+  i2c_port to dev_port to make the two a bit more distinct.
+
+  Also rename i2c_addr to dev_addr for consistency.
+- hdata/i2c: Ignore CFAM I2C master
+
+  Recent FSP firmware builds put in information about the CFAM I2C master
+  in addition the to host I2C masters accessible via XSCOM. Odds are this
+  information should not be there since there's no handshaking between the
+  FSP/BMC and the host over who controls that I2C master, but it is so
+  we need to deal with it.
+
+  This patch adds filtering to the HDAT parser so it ignores the CFAM I2C
+  master. Without this it will create a bogus i2cm@<addr> which migh cause
+  issues.
+- ZZ: hw/imc: Add support to load imc catalog lid file
+
+  Add support to load the imc catalog from a lid file packaged
+  as part of the system firmware. Lid number allocated
+  is 0x80f00103.lid.
+
+
+Bugs Fixed
+----------
+
+Since 6.0-rc2:
+
+- core/opal: Fix recursion check in opal_run_pollers()
+
+  An earlier commit introduced a counter variable poller_recursion to
+  limit to the number number of error messages shown when opal_pollers
+  are run recursively. However the check for the counter value was
+  placed in a way that the poller recursion was only detected first 16
+  times and then allowed afterwards.
+
+  This patch fixes this by moving the check for the counter value inside
+  the conditional branch with some re-factoring so that opal_poller
+  recursion is not erroneously allowed after poll_recursion is detected
+  first 16 times.
+- phb4: Print WOF registers on fence detect
+
+  Without the WOF registers it's hard to figure out what went wrong first,
+  so print those when we print the FIRs when a fence is detected.
+- p9dsu: detect variant in init only if probe fails to found.
+
+  Currently the slot table init happens twice in both probe and init
+  functions due to the variant detection logic called with in-correct
+  condition check.
+
+Since 6.0-rc1:
+
+- core/direct-controls: improve p9_stop_thread error handling
+
+  p9_stop_thread should fail the operation if it finds the thread was
+  already quiescd. This implies something else is doing direct controls
+  on the thread (e.g., pdbg) or there is some exceptional condition we
+  don't know how to deal with. Proceeding here would cause things to
+  trample on each other, for example the hard lockup watchdog trying to
+  send a sreset to the core while it is stopped for debugging with pdbg
+  will end in tears.
+
+  If p9_stop_thread times out waiting for the thread to quiesce, do
+  not hit it with a core_start direct control, because we don't know
+  what state things are in and doing more things at this point is worse
+  than doing nothing. There is no good recipe described in the workbook
+  to de-assert the core_stop control if it fails to quiesce the thread.
+  After timing out here, the thread may eventually quiesce and get
+  stuck, but that's simpler to debug than undefied behaviour.
+
+- core/direct-controls: fix p9_cont_thread for stopped/inactive threads
+
+  Firstly, p9_cont_thread should check that the thread actually was
+  quiesced before it tries to resume it. Anything could happen if we
+  try this from an arbitrary thread state.
+
+  Then when resuming a quiesced thread that is inactive or stopped (in
+  a stop idle state), we must not send a core_start direct control,
+  clear_maint must be used in these cases.
+- hmi: Clear unknown debug trigger
+
+  On some systems, seeing hangs like this when Linux starts: ::
+
+      [ 170.027252763,5] OCC: All Chip Rdy after 0 ms
+      [ 170.062930145,5] INIT: Starting kernel at 0x20011000, fdt at 0x30ae0530 366247 bytes)
+      [ 171.238270428,5] OPAL: Switch to little-endian OS
+
+  If you look at the in memory skiboot console (or do `nvram -p
+  ibm,skiboot --update-config log-level-driver=7`) we see the console get
+  spammed with: ::
+
+      [ 5209.109790675,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
+      [ 5209.109792716,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
+      [ 5209.109794695,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
+      [ 5209.109796689,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
+
+  We're taking the debug trigger (bit 17) early on, before the
+  hmi_debug_trigger function in the kernel is set up.
+
+  This clears the HMI in Skiboot and reports to the kernel instead of
+  bringing down the machine.
+
+- core/hmi: assign flags=0 in case nothing set by handle_hmi_exception
+
+  Theoretically we could have returned junk to the OS in this parameter.
+
+- SLW: Fix mambo boot to use stop states
+
+  After commit 35c66b8ce5a2 ("SLW: Move MAMBO simulator checks to
+  slw_init"), mambo boot no longer calls add_cpu_idle_state_properties()
+  and as such we never enable stop states.
+
+  After adding the call back, we get more testing coverage as well
+  as faster mambo SMT boots.
+
+- phb4: Hardware init updates
+
+  CFG Write Request Timeout was incorrectly set to informational and not
+  fatal for both non-CAPI and CAPI, so set it to fatal.  This was a
+  mistake in the specification.  Correcting this fixes a niche bug in
+  escalation (which is necessary on pre-DD2.2) that can cause a checkstop
+  due to a NCU timeout.
+
+  In addition, set the values in the timeout control registers to match.
+  This fixes an extremely rare and unreproducible bug, though the current
+  timings don't make sense since they're higher than the NCU timeout (16)
+  which will checkstop the machine anyway.
+
+- SLW: quieten 'Configuring self-restore' for DARN,NCU_SPEC_BAR and HRMOR
+
+Since 5.11:
+
+- core: Fix iteration condition to skip garded cpu
+- uart: fix uart_opal_flush to take console lock over uart_con_flush
+  This bug meant that OPAL_CONSOLE_FLUSH didn't take the appropriate locks.
+  Luckily, since this call is only currently used in the crash path.
+- xive: fix missing unlock in error path
+- OPAL_PCI_SET_POWER_STATE: fix locking in error paths
+
+  Otherwise we could exit OPAL holding locks, potentially leading
+  to all sorts of problems later on.
+- hw/slw: Don't assert on a unknown chip
+
+  For some reason skiboot populates nodes in /cpus/ for the cores on
+  chips that are deconfigured. As a result Linux includes the threads
+  of those cores in it's set of possible CPUs in the system and attempts
+  to set the SPR values that should be used when waking a thread from
+  a deep sleep state.
+
+  However, in the case where we have deconfigured chip we don't create
+  a xscom node for that chip and as a result we don't have a proc_chip
+  structure for that chip either. In turn, this results in an assertion
+  failure when calling opal_slw_set_reg() since it expects the chip
+  structure to exist. Fix this up and print an error instead.
+- opal/hmi: Generate one event per core for processor recovery.
+
+  Processor recovery is per core error. All threads on that core receive
+  HMI. All threads don't need to generate HMI event for same error.
+
+  Let thread 0 only generate the event.
+- sensors: Dont add DTS sensors when OCC inband sensors are available
+
+  There are two sets of core temperature sensors today. One is DTS scom
+  based core temperature sensors and the second group is the sensors
+  provided by OCC. DTS is the highest temperature among the different
+  temperature zones in the core while OCC core temperature sensors are
+  the average temperature of the core. DTS sensors are read directly by
+  the host by SCOMing the DTS sensors while OCC sensors are read and
+  updated by OCC to main memory.
+
+  Reading DTS sensors by SCOMing is a heavy and slower operation as
+  compared to reading OCC sensors which is as good as reading memory.
+  So dont add DTS sensors when OCC sensors are available.
+- core/fast-reboot: Increase timeout for dctl sreset to 1sec
+
+  Direct control xscom can take more time to complete. We seem to
+  wait too little on Boston failing fast-reboot for no good reason.
+
+  Increase timeout to 1 sec as a reasonable value for sreset to be delivered
+  and core to start executing instructions.
+- occ: sensors-groups: Add DT properties to mark HWMON sensor groups
+
+  Fix the sensor type to match HWMON sensor types. Add compatible flag
+  to indicate the environmental sensor groups so that operations on
+  these groups can be handled by HWMON linux interface.
+- core: Correctly load initramfs in stb container
+
+  Skiboot does not calculate the actual size and start location of the
+  initramfs if it is wrapped by an STB container (for example if loading
+  an initramfs from the ROOTFS partition).
+
+  Check if the initramfs is in an STB container and determine the size and
+  location correctly in the same manner as the kernel. Since
+  load_initramfs() is called after load_kernel() move the call to
+  trustedboot_exit_boot_services() into load_and_boot_kernel() so it is
+  called after both of these.
+- hdat/i2c.c: quieten "v2 found, parsing as v1"
+- hw/imc: Check for pause_microcode_at_boot() return status
+
+  pause_microcode_at_boot() loops through all the chip's ucode
+  control block and pause the ucode if it is in the running state.
+  But it does not fail if any of the chip's ucode is not initialised.
+
+  Add code to return a failure if ucode is not initialized in any
+  of the chip. Since pause_microcode_at_boot() is called just before
+  attaching the IMC device nodes in imc_init(), add code to check for
+  the function return.
+
+
+Slot location code fixes:
+
+- npu2: Use ibm, loc-code rather than ibm, slot-label
+
+  The ibm,slot-label property is to name the slot that appears under a
+  PCIe bridge. In the past we (ab)used the slot tables to attach names
+  to GPU devices and their corresponding NVLinks which resulted in npu2.c
+  using slot-label as a location code rather than as a way to name slots.
+
+  Fix this up since it's confusing.
+- hdata/slots: Apply slot label to the parent slot
+
+  Slot names only really make sense when applied to an actual slot rather
+  than a device. On witherspoon the GPU devices have a name associated with
+  the device rather than the slot for the GPUs. Add a hack that moves the
+  slot label to the parent slot rather than on the device itself.
+- pci-dt-slot: Big ol' cleanup
+
+  The underlying data that we get from HDAT can only really describe a
+  PCIe system. As such we can simplify the devicetree slot lookup code
+  by only caring about the important cases, namly, root ports and switch
+  downstream ports.
+
+  This also fixes a bug where root port didn't get a Slot label applied
+  which results in devices under that port not having ibm,loc-code set.
+  This results in the EEH core being unable to report the location of
+  EEHed devices under that port.
+
+opal-prd
+^^^^^^^^
+- opal-prd: Insert powernv_flash module
+
+  Explictly load powernv_flash module on BMC based system so that we are sure
+  that flash device is created before starting opal-prd daemon.
+
+  Note that I have replaced pnor_available() check with is_fsp_system(). As we
+  want to load module on BMC system only. Also pnor_init has enough logic to
+  detect flash device. Hence pnor_available() becomes redundant check.
+
+NPU2/NVLINK2
+^^^^^^^^^^^^
+- npu2/hw-procedures: fence bricks on GPU reset
+
+  The NPU workbook defines a way of fencing a brick and
+  getting the brick out of fence state. We do have an implementation
+  of bringing the brick out of fenced/quiesced state. We do
+  the latter in our procedures, but to support run time reset
+  we need to do the former.
+
+  The fencing ensures that access to memory behind the links
+  will not lead to HMI's, but instead SUE's will be populated
+  in cache (in the case of speculation). The expectation is then
+  that prior to and after reset, the operating system components
+  will flush the cache for the region of memory behind the GPU.
+
+  This patch does the following:
+
+  1. Implements a npu2_dev_fence_brick() function to set/clear
+     fence state
+  2. Clear FIR bits prior to clearing the fence status
+  3. Clear's the fence status
+  4. We take the powerbus out of CQ fence much later now,
+     in credits_check() which is the last hardware procedure
+     called after link training.
+- hw/npu2.c: Remove static configuration of NPU2 register
+
+  The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to
+  select NVLink mode, however Hostboot should configure other bits in this
+  register.
+
+  For some reason Skiboot was explicitly clearing bit-6
+  (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared
+  as recent Hostboot versions explicitly set it to the correct value based on
+  the specific system configuration. Therefore Skiboot should not alter it.
+
+  Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or
+  not. Hostboot does not configure this bit so Skiboot should continue to
+  configure it.
+- npu2: Improve log output of GPU-to-link mapping
+
+  Debugging issues related to unconnected NVLinks can be a little less
+  irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog().
+
+  In short, change this: ::
+
+      NPU2: comparing GPU 'GPU2' and NPU2 'GPU1'
+      NPU2: comparing GPU 'GPU3' and NPU2 'GPU1'
+      NPU2: comparing GPU 'GPU4' and NPU2 'GPU1'
+      NPU2: comparing GPU 'GPU5' and NPU2 'GPU1'
+            :
+      npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem.
+
+  to this: ::
+
+      NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1'
+      NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1'
+      NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1'
+      NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1'
+            :
+      NPU6:0:1.0 No PCI device found for slot 'GPU1'
+- npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init
+
+  A bad GPU or other condition may leave us with a subset of links that
+  never get initialized. If an ATSD is sent to one of those bricks, it
+  will never complete, leaving us waiting forever for a response: ::
+
+    watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050]
+    ...
+    Modules linked in: nvidia_uvm(O) nvidia(O)
+    CPU: 23 PID: 2050 Comm: acos Tainted: G        W  O    4.14.0 #2
+    task: c0000000285cfc00 task.stack: c000001fea860000
+    NIP:  c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60
+    REGS: c000001fea863550 TRAP: 0901   Tainted: G        W  O     (4.14.0)
+    MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004484  XER: 20040000
+    CFAR: c0000000000abdf4 SOFTE: 1
+    GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820
+    GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560
+    GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff
+    GPR12: 0000000000008000 c000000003167e80
+    NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0
+    LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370
+
+
+  ATSDs are only sent to bricks which have a valid entry in the XTS_BDF
+  table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless
+  we make it all the way to creating a context for the BDF.
+
+Secure and Trusted Boot
+^^^^^^^^^^^^^^^^^^^^^^^
+- hdata/tpmrel: detect tpm not present by looking up the stinfo->status
+
+  Skiboot detects if tpm is present by checking if a secureboot_tpm_info
+  entry exists. However, if a tpm is not present, hostboot also creates a
+  secureboot_tpm_info entry. In this case, hostboot creates an empty
+  entry, but setting the field tpm_status to TPM_NOT_PRESENT.
+
+  This detects if tpm is not present by looking up the stinfo->status.
+
+  This fixes the "TPMREL: TPM node not found for chip_id=0 (HB bug)"
+  issue, reproduced when skiboot is running on a system that has no tpm.
+
+PCI
+^^^
+- phb4: Restore bus numbers after CRS
+
+  Currently we restore PCIe bus numbers right after the link is
+  up. Unfortunately as this point we haven't done CRS so config space
+  may not be accessible.
+
+  This moves the bus number restore till after CRS has happened.
+- romulus: Add a barebones slot table
+- phb4: Quieten and improve "Timeout waiting for electrical link"
+
+  This happens normally if a slot doesn't have a working HW presence
+  detect and relies instead of inband presence detect.
+
+  The message we display is scary and not very useful unless ou
+  are debugging, so quiten it up and change it to something more
+  meaningful.
+- pcie-slot: Don't fail powering on an already on switch
+
+  If the power state is already the required value, return
+  OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious
+  errors during boot.
+
+CAPI/OpenCAPI
+^^^^^^^^^^^^^
+- capi: Keep the current mmio windows in the mbt cache table.
+
+  When the phb is used as a CAPI interface, the current mmio windows list
+  is cleaned before adding the capi and the prefetchable memory (M64)
+  windows, which implies that the non-prefetchable BAR is no more
+  configured.
+  This patch allows to set only the mbt bar to pass capi mmio window and
+  to keep, as defined, the other mmio values (M32 and M64).
+- npu2-opencapi: Fix 'link internal error' FIR, take 2
+
+  When setting up an opencapi link, we set the transport muxes first,
+  then set the PHY training config register, which includes disabling
+  nvlink mode for the bricks. That's the order of the init sequence, as
+  found in the NPU workbook.
+
+  In reality, doing so works, but it raises 2 FIR bits in the PowerBus
+  OLL FIR Register for the 2 links when we configure the transport
+  muxes. Presumably because nvlink is not disabled yet and we are
+  configuring the transport muxes for opencapi.
+
+  bit 60:
+    link0 internal error
+  bit 61:
+    link1 internal error
+
+  Overall the current setup ends up being correct and everything works,
+  but we raise 2 FIR bits.
+
+  So tweak the order of operations to disable nvlink before configuring
+  the transport muxes. Incidentally, this is what the scripts from the
+  opencapi enablement team were doing all along.
+- npu2-opencapi: Fix 'link internal error' FIR, take 1
+
+  When we setup a link, we always enable ODL0 and ODL1 at the same time
+  in the PHY training config register, even though we are setting up
+  only one OTL/ODL, so it raises a "link internal error" FIR bit in the
+  PowerBus OLL FIR Register for the second link. The error is harmless,
+  as we'll eventually setup the second link, but there's no reason to
+  raise that FIR bit.
+
+  The fix is simply to only enable the ODL we are using for the link.
+- phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode.
+
+  The cxl driver will set the capi value, like other drivers already do.
+- phb4: set TVT1 for tunneled operations in capi mode
+
+  The ASN indication is used for tunneled operations (as_notify and
+  atomics). Tunneled operation messages can be sent in PCI mode as
+  well as CAPI mode.
+
+  The address field of as_notify messages is hijacked to encode the
+  LPID/PID/TID of the target thread, so those messages should not go
+  through address translation. Therefore bit 59 is part of the ASN
+  indication.
+
+  This patch sets TVT#1 in bypass mode when capi mode is enabled,
+  to prevent as_notify messages from being dropped.
+
+Debugging/Testing improvements
+------------------------------
+
+Since 6.0-rc1:
+- mambo: Enable XER CA32 and OV32 bits on P9
+
+  POWER9 adds 32 bit carry and overflow bits to the XER, but we need to
+  set the relevant CTRL1 bit to enable them.
+- Makefile: Fix building natively on ppc64le
+
+  When on ppc64le and CROSS is not set by the environment, make assumes
+  ppc64 and sets a default CROSS. Check for ppc64le as well, so that
+  'make' works out of the box on ppc64le.
+- Experimental support for building with Clang
+- Improvements to testing and Travis CI
+
+Since 5.11:
+
+- core/stack: backtrace unwind basic OPAL call details
+
+  Put OPAL callers' r1 into the stack back chain, and then use that to
+  unwind back to the OPAL entry frame (as opposed to boot entry, which
+  has a 0 back chain).
+
+  From there, dump the OPAL call token and the caller's r1. A backtrace
+  looks like this: ::
+
+      CPU 0000 Backtrace:
+       S: 0000000031c03ba0 R: 000000003001a548   ._abort+0x4c
+       S: 0000000031c03c20 R: 000000003001baac   .opal_run_pollers+0x3c
+       S: 0000000031c03ca0 R: 000000003001bcbc   .opal_poll_events+0xc4
+       S: 0000000031c03d20 R: 00000000300051dc   opal_entry+0x12c
+       --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 ---
+
+  This is pretty basic for the moment, but it does give you the bottom
+  of the Linux stack. It will allow some interesting improvements in
+  future.
+
+  First, with the eframe, all the call's parameters can be printed out
+  as well.  The ___backtrace / ___print_backtrace API needs to be
+  reworked in order to support this, but it's otherwise very simple
+  (see opal_trace_entry()).
+
+  Second, it will allow Linux's stack to be passed back to Linux via
+  a debugging opal call. This will allow Linux's BUG() or xmon to
+  also print the Linux back trace in case of a NMI or MCE or watchdog
+  lockup that hits in OPAL.
+- asm/head: implement quiescing without stack or clobbering regs
+
+  Quiescing currently is implmeented in C in opal_entry before the
+  opal call handler is called. This works well enough for simple
+  cases like fast reset when one CPU wants all others out of the way.
+
+  Linux would like to use it to prevent an sreset IPI from
+  interrupting firmware, which could lead to deadlocks when crash
+  dumping or entering the debugger. Linux interrupts do not recover
+  well when returning back to general OPAL code, due to r13 not being
+  restored. OPAL also can't be re-entered, which may happen e.g.,
+  from the debugger.
+
+  So move the quiesce hold/reject to entry code, beore the stack or
+  r1 or r13 registers are switched. OPAL can be interrupted and
+  returned to or re-entered during this period.
+
+  This does not completely solve all such problems. OPAL will be
+  interrupted with sreset if the quiesce times out, and it can be
+  interrupted by MCEs as well. These still have the issues above.
+- core/opal: Allow poller re-entry if OPAL was re-entered
+
+  If an NMI interrupts the middle of running pollers and the OS
+  invokes pollers again (e.g., for console output), the poller
+  re-entrancy check will prevent it from running and spam the
+  console.
+
+  That check was designed to catch a poller calling opal_run_pollers,
+  OPAL re-entrancy is something different and is detected elsewhere.
+  Avoid the poller recursion check if OPAL has been re-entered. This
+  is a best-effort attempt to cope with errors.
+- core/opal: Emergency stack for re-entry
+
+  This detects OPAL being re-entered by the OS, and switches to an
+  emergency stack if it was. This protects the firmware's main stack
+  from re-entrancy and allows the OS to use NMI facilities for crash
+  / debug functionality.
+
+  Further nested re-entry will destroy the previous emergency stack
+  and prevent returning, but those should be rare cases.
+
+  This stack is sized at 16kB, which doubles the size of CPU stacks,
+  so as not to introduce a regression in primary stack size. The 16kB
+  stack originally had a 4kB machine check stack at the top, which was
+  removed by 80eee1946 ("opal: Remove machine check interrupt patching
+  in OPAL."). So it is possible the size could be tightened again, but
+  that would require further analysis.
+
+- hdat_to_dt: hash_prop the same on all platforms
+  Fixes this unit test on ppc64le hosts.
+- mambo: Add persistent memory disk support
+
+  This adds support to for mapping disks images using persistent
+  memory. Disks can be added by setting this ENV variable:
+
+    PMEM_DISK="/mydisks/disk1.img,/mydisks/disk2.img"
+
+  These will show up in Linux as /dev/pmem0 and /dev/pmem1.
+
+  This uses a new feature in mambo "mysim memory mmap .." which is only
+  available since mambo commit 0131f0fc08 (from 24/4/2018).
+
+  This also needs the of_pmem.c driver in Linux which is only available
+  since v4.17. It works with powernv_defconfig + CONFIG_OF_PMEM.
+- external/mambo: Add di command to decode instructions
+
+  By default you get 16 instructions but you can specify the number you
+  want.  i.e. ::
+
+      systemsim % di 0x100 4
+      0x0000000000000100: Enc:0xA64BB17D : mtspr   HSPRG1,r13
+      0x0000000000000104: Enc:0xA64AB07D : mfspr   r13,HSPRG0
+      0x0000000000000108: Enc:0xF0092DF9 : std     r9,0x9F0(r13)
+      0x000000000000010C: Enc:0xA6E2207D : mfspr   r9,PPR
+
+  Using di since it's what xmon uses.
+- mambo/mambo_utils.tcl: Inject an MCE at a specified address
+
+  Currently we don't support injecting an MCE on a specific address.
+  This is useful for testing functionality like memcpy_mcsafe()
+  (see https://patchwork.ozlabs.org/cover/893339/)
+
+  The core of the functionality is a routine called
+  inject_mce_ue_on_addr, which takes an addr argument and injects
+  an MCE (load/store with UE) when the specified address is accessed
+  by code. This functionality can easily be enhanced to cover
+  instruction UE's as well.
+
+  A sample use case to create an MCE on stack access would be ::
+
+    set addr [mysim display gpr 1]
+    inject_mce_ue_on_addr $addr
+
+  This would cause an mce on any r1 or r1 based access
+- external/mambo: improve helper for machine checks
+
+  Improve workarounds for stop injection, because mambo often will
+  trigger on 0x104/204 when injecting sreset/mces.
+
+  This also adds a workaround to skip injecting on reservations to
+  avoid infinite loops when doing inject_mce_step.
+- travis: Enable ppc64le builds
+
+  At least on the IBM Travis Enterprise instance, we can now do
+  ppc64le builds!
+
+  We can only build a subset of our matrix due to availability of
+  ppc64le distros. The Dockerfiles need some tweaking to only
+  attempt to install (x86_64 only) Mambo binaries, as well as the
+  build scripts.
+- external: Add "lpc" tool
+
+  This is a little front-end to the lpc debugfs files to access
+  the LPC bus from userspace on the host.
+- core/test/run-trace: fix on ppc64el
author	Stewart Smith <stewart@linux.ibm.com>	2018-05-11 16:37:07 -0500
committer	Stewart Smith <stewart@linux.ibm.com>	2018-05-11 17:17:13 -0500
commit	7519644443707e972fed836853dd92031efcb858 (patch)
tree	a6e0131a23f0a12e81b3b426151223c82a438248 /doc
parent	05d9d981df9c96c967882699574ff2cc4ae13f8b (diff)
download	blackbird-skiboot-7519644443707e972fed836853dd92031efcb858.tar.gz blackbird-skiboot-7519644443707e972fed836853dd92031efcb858.zip