talos-skiboot - Talos™ II skiboot sources

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Makefile: Paper over gard and pflash coverage issues	Andrew Jeffery	2019-02-21	3	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`make coverage-report` gave the following error: (cd external/pflash; lcov -q -c -d . -o pflash.info --rc lcov_branch_coverage=1; sed -i -e 's%external/pflash/libflash%libflash%; s%external/pflash/ccan%ccan%' pflash.info) (cd external/gard; lcov -q -c -d . -o gard.info --rc lcov_branch_coverage=1; sed -i -e 's%external/gard/libflash%libflash%; s%external/gard/ccan%ccan%' gard.info) geninfo: WARNING: no .gcda files found in . - skipping! geninfo: WARNING: no .gcda files found in . - skipping! lcov -q -c -d . -d ccan/check_type/test/ -d ccan/str/test/ -d ccan/str/test/ -d ccan/list/test/ -d ccan/list/test/ -d ccan/list/test/ -d ccan/list/test/ -d ccan/list/test/ -d ccan/build_assert/test/ -d ccan/short_types/test/ -d ccan/short_types/test/ -d ccan/array_size/test/ -d ccan/container_of/test/ -d ccan/endian/test/ -d libc/test/ -d libc/test/ -d libc/test/ -d libc/test/ -o skiboot.info --rc lcov_branch_coverage=1 lcov -q -r skiboot.info 'external/pflash/' -o skiboot.info lcov -q -r skiboot.info 'external/gard/' -o skiboot.info lcov -q -a skiboot.info -a external/pflash/pflash.info -o skiboot.info lcov: ERROR: no valid records found in tracefile external/pflash/pflash.info make: *** [/home/andrew/src/open-power/skiboot/Makefile.main:315: skiboot.info] Error 255 And similar again for the gard tool. We should really untangle the build strategy for tools in external/, but in the mean time paper over the problem of generating the lcov output at the top level by ensuring we have a means to generate the necessary gcda files for lcov to consume. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	gard: Fix warnings from gcc 8.2.0	Andrew Jeffery	2019-02-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gard.c:285:5: error: no previous prototype for ‘parse_path’ [-Werror=missing-prototypes] int parse_path(const char str, struct entity_path parsed) ^~~~~~~~~~ gard.c: In function ‘do_list’: gard.c:459:46: error: unused parameter ‘argc’ [-Werror=unused-parameter] static int do_list(struct gard_ctx ctx, int argc, char argv) ~~~~^~~~ gard.c:459:59: error: unused parameter ‘argv’ [-Werror=unused-parameter] static int do_list(struct gard_ctx ctx, int argc, char **argv) ~~~~~~~^~~~ Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	pflash: Increase stack frame size warning threshold	Andrew Jeffery	2019-02-21	1	-0/+1
\| \| \| \| \| \| \| \|	pflash is a userspace tool, stack space isn't really a constraint that we care about. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	ffspart, libflash: Fix stack size warnings	Andrew Jeffery	2019-02-21	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libflash/file.c: In function 'file_erase': libflash/file.c:134:1: error: the frame size of 4128 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] } ^ and ffspart.c: In function ‘main’: ffspart.c:529:1: error: the frame size of 4864 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] } ^ In both cases, mark the local variables as static to avoid the stack. The static approach is valid for file.c as the buffer is always filled with `~0`. Given it's now going to be in .bss due to static we have to still perform the memset(), but racing memset()s in this fashion won't be harmful, just wasteful. For ffspart.c's main(), there are bigger problems if that needs to be re-entrant. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	libflash/test: Generate header dependencies for tests	Andrew Jeffery	2019-02-21	1	-1/+4
\| \| \| \| \| \|	Cc: stable Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/phb3/naples: Disable D-states	Alexey Kardashevskiy	2019-02-20	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Putting "Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]" (more precisely, the second of 2 its PCI functions, no matter in what order) into the D3 state causes EEH with the "PCT timeout" error. This has been noticed on garrison machines only and firestones do not seem to have this issue. This disables D-states changing for devices on root buses on Naples by installing a config space access filter (copied from PHB4). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-By: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata/memory: Add NVDIMM support	Oliver O'Halloran	2019-02-20	1	-10/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NVDIMMs are memory modules that use a battery backup system to allow the contents RAM to be saved to non-volatile storage if system power goes away unexpectedly. This allows them to be used a high-performance storage device, suitable for serving as a cache for SSDs and the like. Configuration of NVDIMMs is handled by hostboot and communicated to OPAL via the HDAT. We need to parse out the NVDIMM memory ranges and create memory regions with the "pmem-region" compatible label to make them available to the host. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata/memory: Remove find_shared()	Oliver O'Halloran	2019-02-20	1	-40/+3
\| \| \| \| \| \| \| \| \| \|	This helper function is used to check if the node we are about to create already exists. There's no real need for this considering we already have perfectly functional methods for searching the device-tree, so drop it in favour of the more standard dt_find_name_addr(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata/test: Add OP920 HDAT test data	Oliver O'Halloran	2019-02-20	3	-0/+4904
\| \| \| \| \| \| \| \|	It's probably about time we did that. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [stewart: add in dts result] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata/test: Fix up linux,sml-base property	Oliver O'Halloran	2019-02-20	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	The linux,sml-base property stores a raw pointer into the HDAT area. When running the hdat parser tester the load address of the HDAT will change each time the tool is run so we need to sanatise the property to get consistent output. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/bt: Add backend interface to disable ipmi message retry option	Vasant Hegde	2019-02-20	3	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During boot OPAL makes IPMI_GET_BT_CAPS call to BMC to get BT interface capabilities which includes IPMI message max resend count, message timeout, etc,. Most of the time OPAL gets response from BMC within specified timeout. In some corner cases (like mboxd daemon reset in BMC, BMC reboot, etc) OPAL may not get response within timeout period. In such scenarios, OPAL resends message until max resend count reaches. OPAL uses synchronous IPMI message (ipmi_queue_msg_sync()) for few operations like flash read, write, etc. Thread will wait in OPAL until it gets response from BMC. In some corner cases like BMC reboot, thread may wait in OPAL for long time (more than 20 seconds) and results in kernel hardlockup. This patch introduces new interface to disable message resend option. We will disable message resend option for synchrous message. This will greatly reduces kernel hardlock up issues. This is short term fix. Long term solution is to convert all synchronous messages to asynhrounous one. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/bt: Fix message retry handler	Vasant Hegde	2019-02-19	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	In some corner cases (like BMC reboot), bt_send_and_unlock() starts message timer, but won't send message to BMC as driver is not free to send message. bt_expire_old_msg() function enables H2B interrupt without actually sending message. This patch fixes above issue. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/test: generalise makefile	Stewart Smith	2019-02-19	1	-9/+9
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot v6.0.17 release notes	Stewart Smith	2019-02-20	1	-0/+66
\| \| \| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit 229ed05931b5e138d240635341b85dce300a8826) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot v6.2.1 release notes	Stewart Smith	2019-02-20	1	-0/+83
\| \| \| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit 0647f2a2c4ebb47937a92d034af41d6848cb1313) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	platforms/witherspoon: Make PCIe shared slot error message more informative	Andrew Donnellan	2019-02-18	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	If we're missing chips for some reason, we print a warning when configuring the PCIe shared slot. The warning doesn't really make it clear what "shared slot" is, and if it's printed, it'll come right after a bunch of messages about NPU setup, so let's clarify the message to explicitly mention PCI. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [stewart: bikeshed] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	pflash: Don't try update RO ToC	Andrew Jeffery	2019-02-18	1	-8/+34
\| \| \| \| \| \| \| \|	In the future it's likely the ToC will be marked as read-only. Don't error out by assuming its writable. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	opal: Deprecate reading the PHB status	Alexey Kardashevskiy	2019-02-18	9	-34/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The OPAL_PCI_EEH_FREEZE_STATUS call takes a bunch of parameters, one of them is @phb_status. It is defined as __be64* and always NULL in the current Linux upstream but if anyone ever decides to read that status, then the PHB3's handler will assume it is struct OpalIoPhb3ErrorData* (which is a lot bigger than 8 bytes) and zero it causing the stack corruption; p7ioc-phb has the same issue. This removes @phb_status from all eeh_freeze_status() hooks and moves the error message from PHB4 to the affected OPAL handlers. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-By: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Update some comments	Oliver O'Halloran	2019-02-18	1	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	I now know what an IODA cache is and I'm not happy about it. With the power of Comments™ you too can share the misery. Remove the big WARNING about the P8 specific hardware bug while we're here. That seems to have been copied over from phb3.c and no one thought about it too hard. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Eliminate peltv_cache	Oliver O'Halloran	2019-02-18	2	-20/+13
\| \| \| \| \| \| \| \| \| \|	The PELT-V is also an in-memory table and there is no reason to have two copies of it. Removing the cache shaves another 128KB off the size of each struct phb4. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Eliminate p->rte_cache	Oliver O'Halloran	2019-02-18	2	-24/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ancient times we added a caches to struct phb3 for some of the IODA tables which can only be accessed in-directly via XSCOM. A cache for the Requester Translation Table (RTT) was also added even though this is an in-memory table. This was carried over to PHB4 when Ben did the initial copy and paste, but it's still largely pointless. There's no real need to have a second copy of the table. This patch removes the "cache" and changes all the users to reference the RTT directly if we need to. This reduces the size of the struct phb4 by 128KB. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Remove pointless NULL checks	Oliver O'Halloran	2019-02-18	1	-12/+2
\| \| \| \| \| \| \| \| \| \|	When we allocate the various in-memory tables we assert() on the allocation. There's no point in checking if the table pointer is NULL or not at runtime. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Rework BDFN filtering in phb4_set_pe()	Oliver O'Halloran	2019-02-18	1	-41/+17
\| \| \| \| \| \| \| \| \| \|	General cleanup. For a function that does nothing more than a mask-and-compare the current implementation is way more convoluted than it has any right to be. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	devicetree: Add Makefile to build dtb binaries	Reza Arbab	2019-02-18	2	-0/+11
\| \| \| \| \| \| \|	Add a simple Makefile to build external/devicetree/*.dtb. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	devicetree: Add p9-simics.dts	Reza Arbab	2019-02-18	1	-0/+16
\| \| \| \| \| \| \|	Add a p9-based devicetree that's suitable for use with Simics. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	devicetree: Move power9-phb4.dts	Reza Arbab	2019-02-18	2	-235/+221
\| \| \| \| \| \| \| \| \|	Clean up the formatting of power9-phb4.dts and move it to external/devicetree/p9.dts. This sets us up to include it as the basis for other trees. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	devicetree: Add nx node to power9-phb4.dts	Reza Arbab	2019-02-18	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A (non-qemu) p9 without an nx node will assert in p9_darn_init(): dt_for_each_compatible(dt_root, nx, "ibm,power9-nx") break; if (!nx) { if (!dt_node_is_compatible(dt_root, "qemu,powernv")) assert(nx); return; } Since NX is this essential, add it to the device tree. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	devicetree: Fix typo in power9-phb4.dts	Reza Arbab	2019-02-18	1	-1/+1
\| \| \| \| \| \| \|	Change "impi" to "ipmi". Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	devicetree: Fix syntax error in power9-phb4.dts	Reza Arbab	2019-02-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Remove the extra space causing this: Error: power9-phb4.dts:156.15-16 syntax error FATAL ERROR: Unable to parse input tree Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	libflash/ecc: Fix compilation warning	Vasant Hegde	2019-02-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are hitting below warning on gcc9. gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power8 -mtune=power8 -fasynchronous-unwind-tables -fstack-clash-protection -O2 -Wall -Werror -Wno-stringop-truncation -I. -c libflash/ecc.c -o libflash-ecc.o libflash/ecc.c: In function 'memcpy_to_ecc_unaligned': libflash/ecc.c:419:24: error: taking address of packed member of 'struct ecc64' may result in an unaligned pointer value [-Werror=address-of-packed-member] 419 \| memcpy(inc_uint64_by(&ecc_word.data, alignment), src, bytes_wanted); \| ^~~~~~~~~~~~~~ libflash/ecc.c:448:24: error: taking address of packed member of 'struct ecc64' may result in an unaligned pointer value [-Werror=address-of-packed-member] 448 \| memcpy(inc_uint64_by(&ecc_word.data, len), inc_ecc64_by(dst, len), \| ^~~~~~~~~~~~~~ cc1: all warnings being treated as errors Fixes: https://github.com/open-power/skiboot/issues/218 Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata/i2c: Reduce severity of log message	Vasant Hegde	2019-02-18	1	-1/+1
\| \| \| \| \| \| \| \|	Looks like WARNING message resulting in some unnecessary bug report. Lets reduce severity to PR_NOTICE. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	ipmi/power: Fix system reboot issue	Vasant Hegde	2019-02-18	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel gets response from OPAL it runs opal_poll_events() until firmware handles the request. On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to initiate system reboot/shutdown. At present OPAL queues IPMI messages and return SUCESS to Host. If BMC is not ready to accept command (like BMC reboot), then these message will fail. We have to manually reboot/shutdown the system using BMC interface. This patch adds logic to validate message return value. If message failed, then it will resend the message. At some stage BMC will be ready to accept message and handles IPMI message. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	pflash: Support encoding/decoding ECC'd partitions	Stewart Smith	2019-02-14	3	-10/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the new --ecc option, pflash can add/remove ECC when reading/writing flash partitions protected by ECC. This is not flawless with current PNORs out in the wild though, as they do not typically fill the whole partition with valid ECC data, so you have to know how big the valid ECC'd data is and specify the size manually. Note that for some partitions this is pratically impossible without knowing the details of the content of the partition. A future patch is likely to introduce an option to "stop reading data when ECC starts failing and assume everything is okay rather than error out" to support reading the "valid" data from existing PNOR images. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	firmware-versions: Add test case for parsing VERSION	Stewart Smith	2019-02-13	18	-147/+364
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also make it possible to use with afl-lop/afl-fuzz just to help make sure we're all good. Additionally, if we hit a entry in VERSION that is larger than our buffer size, we skip over it gracefully rather than overwriting the stack. This is only a problem if VERSION isn't trusted, which as of 4b8cc05a94513816d43fb8bd6178896b430af08f it is verified as part of Secure Boot. CC: stable # v5.9+ Fixes: 9727fe384b8685270d344201f7e051475eea3a0b [stewart: fix up include ordering for building on centos7] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	doc: Adjusting tags in release notes to eliminate global conflict for ↵	Jeff Scheel	2019-02-13	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	singlehtml builds Change tags shared-slot-rn and capi2-rn tags to reduce conflict tags in skiboot-5.7.rst For the singlehtml builds to work correctly, the same tags cannot be used globally in any files. These two tags, shared-slot-rn and capi2-rn, also appear in skiboot-5.7.rst. To eliminate, the tags had a suffix "-rc1" for both the definition and use of each tag. Signed-off-by: Jeff Scheel <scheel@us.ibm.com> [stewart: use 5.7 in tag just to be complete] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/exceptions: implement support for MCE interrupts in powersave	Nicholas Piggin	2019-02-13	5	-16/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ISA specifies that MCE interrupts in power saving modes will enter at 0x200 with powersave bits in SRR1 set. This is not currently supported properly, the MCE will just happen like a normal interrupt, but GPRs could be lost, which would lead to crashes (e.g., r1, r2, r13 etc). So check the power save bits similarly to the sreset vector, and handle this properly. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/exceptions: allow recoverable sreset exceptions	Nicholas Piggin	2019-02-13	6	-19/+55
\| \| \| \| \| \| \| \|	This requires implementing the MSR[RI] bit. Then just allow all non-fatal sreset exceptions to recover. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/exceptions: implement an exception handler for non-powersave sresets	Nicholas Piggin	2019-02-13	5	-11/+77
\| \| \| \| \| \| \| \|	Detect non-powersave sresets and send them to the normal exception handler which prints registers and stack. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	asm/head: sreset handler remove FIXUP_ENDIAN	Nicholas Piggin	2019-02-13	2	-15/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Remove FIXUP_ENDIAN from the normal sreset handler (not the fast reboot handler), to prevent it from trashing registers and CFAR. This mean sreset can be used to report a reliable register dump, and even be recoverable. A watchdog could be implemented to catch and diagnose stuck CPUs during boot using sreset. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/fast-reboot: fast reboot specific sreset patch	Nicholas Piggin	2019-02-13	4	-24/+48
\| \| \| \| \| \| \| \| \| \| \| \|	Provide an sreset handler specifically for fast reboots, which allows FIXUP_ENDIAN to be removed from the normal sreset handler in the next patch. The save_1 == 0 condition is no longer required to signal a fast reboot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	asm/head: provide asm support for interrupts to be returned from	Nicholas Piggin	2019-02-13	3	-20/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the redzone to the interrupt stack, and code to restore registers. This can be used for a number of things. Initially it will be used to recover from system reset interrupts, it could later be used to handle recoverable machine checks, use the decrementer to implement a watchdog, handle HMI interrupts at boot, and to implement virtual memory. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/init: enable machine check on secondaries	Nicholas Piggin	2019-02-13	2	-0/+40
\| \| \| \| \| \| \| \| \| \|	Secondary CPUs currently run with MSR[ME]=0 during boot, whih means if they take a machine check, the system will checkstop. Enable ME where possible and allow them to print registers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/fast-reboot: improve NMI handling during fast reset	Nicholas Piggin	2019-02-13	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \|	Improve sreset and MCE handling in fast reboot. Switch the HILE bit off before copying OPAL's exception vectors, so NMIs can be handled properly. Also disable MSR[ME] while the vectors are being overwritten. Some of the remaining problem cases are documented in comments. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/init: rearrange final boot steps	Nicholas Piggin	2019-02-13	1	-12/+11
\| \| \| \| \| \| \| \| \|	Take secondaries out of sleep mode as late as possible, which tends to help with simulator boot speeds. Make give_self_os() the last step before starting the kernel, which matches the way secondaries behave. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	asm/head: use HSRR exception registers in FIXUP_ENDIAN	Nicholas Piggin	2019-02-12	1	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Taken from the Linux FIXUP_ENDIAN_HV macro, use the HSRR registers in FIXUP_ENDIAN. This allows the 0x100 exception handler (the single user of the macro) to preserve SRR registers and potentially recover, debug, or do something useful with them. This also allows the maco to be used in code with MSR[RI]=1, if the need arises. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/exceptions: save current MSR in exception frame	Nicholas Piggin	2019-02-12	4	-26/+35
\| \| \| \| \| \| \| \| \|	Save and print the MSR of the interrupt context. This can be derived from the interrupt type, SRR1, and other system register settings. But it can be useful to quickly verify what's happening. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/cpu: do not inline cpu_relax	Nicholas Piggin	2019-02-12	2	-11/+13
\| \| \| \| \| \| \| \| \|	The added nops now push it up in size, and -Os uninlines it for every compilation unit that calls it more than once, so it's much better to just uninline. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/init: rename setup_reset_vector	Nicholas Piggin	2019-02-12	3	-4/+4
\| \| \| \| \| \| \|	Use the word copy, to match copy_exception_vectors. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	external/mambo: allow CPU targeting for most debug utils	Nicholas Piggin	2019-02-12	1	-97/+229
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Debug util functions target CPU 0:0:0 by default Some can be overidden explicitly per invocation, and others can't at all. Even for those that can be overidden, it is a pain to type them out when you're debugging a particular thread. Provide a new 'target' function that allows the default CPU target to be changed. Wire that up that default to all other utils. Provide a new 'S' step command which only steps the target CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/cpu: HID update race	Nicholas Piggin	2019-02-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	If the per-core HID register is updated concurrently by multiple threads, updates can get lost. This has been observed during fast reboot where the HILE bit does not get cleared on all cores, which can cause machine check exception interrupts to crash. Fix this by only updating HID on thread0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>