summaryrefslogtreecommitdiffstats
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
...
* core/i2c: Remove bus specific alloc and free callbacksOliver O'Halloran2018-09-171-8/+9
| | | | | | | These are now pointless and they can be replaced with zalloc() and free(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/i2c: Move the timeout field into i2c_requestOliver O'Halloran2018-09-171-1/+2
| | | | | | | | | | | Currently to set a per-request timeout you need to use i2c_req_set_timeout() which is a wrapper for a per-bus method that sets the actual timeout. This design doesn't make a whole lot of sense, so move the timeout field into the generic i2c_request structure and set the timeout to be set using that. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Ignore debug trigger inject core FIR.Mahesh Salgaonkar2018-09-171-1/+0
| | | | | | | | | | | | | Core FIR[60] is a side effect of the work around for the CI Vector Load issue in DD2.1. Usually this gets delivered as HMI with HMER[17] where Linux already ignores it. But it looks like in some cases we may happen to see CORE_FIR[60] while we are already in Malfunction Alert HMI (HMER[0]) due to other reasons e.g. CAPI recovery or NPU xstop. If that happens then just ignore it instead of crashing kernel as not recoverable. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal: use for_each_safe to iterate over opal_syncersVaibhav Jain2018-09-171-2/+2
| | | | | | | | | | | | | | | | | | | | | Presently a fault will happen in opal_sync_host_reboot if a callback tries to remove itself from the opal_syncers list by calling opal_del_host_sync_notifier. This happens as iteration over opal_syncers is done using the list_for_each() which doesn't preserve list_node->next. So when the current opal_syncers callback removes itself from the list, current node contents are lost and current_node->next pointer is rendered invalid. To fix this we simply switch from list_for_each() to list_for_each_safe() which keeps the current_node->next cached hence even if the current node is freed, iteration over subsequent nodes can still continue. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/npu2, platform: Restructure OpenCAPI i2c reset/presence pinsAndrew Donnellan2018-09-171-10/+14
| | | | | | | | | | | | | | | | | | | | | | In platform_ocapi, we define i2c_{reset,presence}_odl{0,1} to specify the appropriate reset/presence GPIO pins for devices connected to ODL0 and ODL1 respectively. This is obviously wrong, because a device connected to brick 2 and a device connected to brick 4 are going to be different devices connected to different I2C pins, but rather conveniently we haven't had to deal with systems that can use the full 4 bricks as yet. Now that we're adding OpenCAPI support for Witherspoon, we should change this to specify pins separately for all 4 bricks. Replace i2c_{reset,presence}_odl{0,1} with i2c_{reset,presence}_brick{2,3,4,5} and update the presence detection code, device reset code, and existing platforms accordingly. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/npu2, platform: Add NPU2 platform device detection callbackAndrew Donnellan2018-09-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no standardised way to determine the presence and type of devices connected to an NPU on POWER9. Currently, we hardcode device types based on platform type (as no platform currently supports both OpenCAPI and NVLink), and for OpenCAPI platforms we use I2C to detect presence. Witherspoon (and potentially other platforms later on) supports both NVLink and OpenCAPI, and additionally uses SXM2 connectors which can carry more than one link, rather than the SlimSAS connectors used for OpenCAPI on Zaius and ZZ. This necessitates some special handling. Add a platform callback for NPU device detection. In a later patch, we will use this to implement Witherspoon-specific device detection. For now, add a Witherspoon stub that sets all links to NVLink (i.e. current behaviour). Move the existing I2C-based presence detection for OpenCAPI devices on Zaius/ZZ into common code, which we use by default for platforms which do not define a callback. Clean up the use of the ibm,npu-link-type property, which will now be exposed solely for debugging and not consumed internally. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/npu2: Common NPU2 init routine between NVLink and OpenCAPIAndrew Donnellan2018-09-171-2/+0
| | | | | | | | | | | | | | | Replace probe_npu2() and probe_npu2_opencapi() with a new shared probe_npu2(). Refactor some of the common NPU setup code into shared code. No functional change. This patch does not implement support for using both types of devices simultaneously on the same NPU - we expect to add this sometime in the future. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* TEMPORARY HACK: Disable verifying VERSIONStewart Smith2018-09-131-1/+6
| | | | | | | Seeing as all the VERSION signing code is taking way too long to get upstream, let's temporarily skip verifying VERSION for now. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/cpu: Fix memory allocation for job arrayVaidyanathan Srinivasan2018-09-131-2/+2
| | | | | | | | | | | | | | | fixes: 7a3f307e core/cpu: parallelise global CPU register setting jobs This bug would result in boot-hang on some configurations due to cpu_wait_job() endlessly waiting for the last bogus jobs[cpu->pir] pointer. Reported-by: Stephanie Swanson <swanman@us.ibm.com> Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* cpu: Better output when waiting for a very long jobBenjamin Herrenschmidt2018-08-161-0/+5
| | | | | | | | | | Instead of printing at the end if the job took more than 1s, print in the loop every 30s along with a backtrace. This will give us some output if the job is deadlocked. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [stewart: bump to 30s rather than 5s, preserve PR_DEBUG for >1s] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* lock: Fix interactions between lock dependency checker and stack checkerBenjamin Herrenschmidt2018-08-162-15/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | The lock dependency checker does a few nasty things that can cause re-entrancy deadlocks in conjunction with the stack checker or in fact other debug tests. A lot of it revolves around taking a new lock (dl_lock) as part of the locking process. This tries to fix it by making sure we do not hit the stack checker while holding dl_lock. We achieve that in part by directly using the low-level __try_lock and manually unlocking on the dl_lock, and making some functions "nomcount". In addition, we mark the dl_lock as being in the console path to avoid deadlocks with the UART driver. We move the enabling of the deadlock checker to a separate config option from DEBUG_LOCKS as well, in case we chose to disable it by default later on. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* lock: Move code aroundBenjamin Herrenschmidt2018-08-161-39/+39
| | | | | | | | This moves __try_lock() and lock_timeout() as a preparation for the next patch. No code change Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* lock: Increase con_suspend before __try_lockBenjamin Herrenschmidt2018-08-161-2/+4
| | | | | | | | | Otherwise, we might have the lock and hit prlog's inside __try_lock() in the list check (among others) in debug builds. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* i2c: Ensure ordering between i2c_request_send() and completionBenjamin Herrenschmidt2018-08-151-0/+3
| | | | | | | | | | | | | | i2c_request_send loops waiting for a flag "uc.done" set by the completion routine, and then look for a result code also set by that same completion. There is no synchronization, the completion can happen on another processor, so we need to order the stores to uc and the reads from uc so that uc.done is stored last and tested first using memory barriers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* i2c: Fix multiple-enqueue of the same request on NACKBenjamin Herrenschmidt2018-08-151-4/+3
| | | | | | | | | | i2c_request_send() will retry the request if the error is a NAK, however it forgets to clear the "ud.done" flag. It will thus loop again and try to re-enqueue the same request causing internal request list corruption. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Catch NPU2 HMIs for opencapiFrederic Barrat2018-08-131-5/+10
| | | | | | | | | HMIs for NPU2 are filtered with the 'compatible' string of the PHB, so add opencapi to the mix. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/cpu: Call memset with proper cpu_thread offsetVasant Hegde2018-08-131-1/+1
| | | | | | | | | | | | | | "cpu_thread *t + value" vs "(void *)t + val" Fixes: cfe9d441 (core/cpu: Prevent clobbering of stack guard for boot-cpu) CC: stable <skiboot@lists.ozlabs.org> # v6.0+ CC: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> CC: Nicholas Piggin <npiggin@gmail.com> CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain<vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* pci: Move logging macros to pci.hReza Arbab2018-08-061-21/+0
| | | | | | | | | Move the PCI{TRACE,DBG,NOTICE,ERR} logging macros from pci.c to pci.h so they can be used in other files. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pci: Print 'PCI Summary' at PR_NOTICEOliver O'Halloran2018-08-021-1/+1
| | | | | | | | | | | The actual entries of the PCI Summary are printed at PR_NOTICE so that they go to the console during boot. The header however does not which breaks my patented "grep 'PCI Summary' -A 100" technique for scraping the summary out of a log file when that log is recorded from the SOL console. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* pci: Clarify power down logicOliver O'Halloran2018-08-011-2/+6
| | | | | | | | | | Currently pci_scan_bus() unconditionally calls pci_slot_set_power_state() when it's finished scanning a bus. This is one of those things that makes you go "WHAT?" when you first see it and frankly the skiboot PCI code could do with less of that. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* mem_region: Merge similar allocations when dumpingOliver O'Halloran2018-08-011-7/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently we print one line for each allocation done at runtime when dumping the memory allocations. We do a few thousand allocations at boot so this can result in a huge amount of text being printed which is a) slow to print, and b) Can result in the log buffer overflowing which destroys otherwise useful information. This patch adds a de-duplication to this memory allocation dump by merging "similar" allocations (same location, same size) into one. Unfortunately, the algorithm used to do the de-duplication is quadratic, but considering we only dump the allocations in the event of a fatal error I think this is acceptable. I also did some benchmarking and found that on a ZZ it takes ~3ms to do a dump with 12k allocations. On a Zaius it's slightly longer at about ~10ms for 10k allocs. However, the difference there was due to the output being written to the UART. This patch also bumps the log level to PR_NOTICE. PR_INFO messages are suppressed at the default log level, which probably isn't something you want considering we only dump the allocations when we run out of skiboot heap space. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/cpu.c: assert pir is sane before usingStewart Smith2018-07-201-0/+1
| | | | Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Fixup unit tests for cpu_queue_job() in mem_region.cStewart Smith2018-07-183-3/+12
| | | | | Fixes: 06808a037d44231ba36e814ff1dbf66bc8b707da Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pci-quirk: Clean up commented code in quirk_astbmc_vga()Andrew Jeffery2018-07-171-13/+0
| | | | | | | Also remove the comment associated with the commented code. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pci-quirk: Remove broken comment in quirk_astbmc_vga()Andrew Jeffery2018-07-171-15/+0
| | | | | | | | The comment talks about one mechanism to handle the quirk whilst the code uses another. Avoid confusion by removing the comment. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/flash: Emit a warning if Skiboot version doesn't matchSamuel Mendoza-Jonas2018-07-171-0/+4
| | | | | Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Recognise signed VERSION partitionSamuel Mendoza-Jonas2018-07-172-1/+16
| | | | | | | | | | | | | | | A few things need to change to support a signed VERSION partition: - A signed VERSION partition will be 4K + SECURE_BOOT_HEADERS_SIZE (4K). - The VERSION partition needs to be loaded after secure/trusted boot is set up, and therefore after nvram_init(). - Added to the trustedboot resources array. This also moves the ipmi_dt_add_bmc_info() call to after flash_dt_add_fw_version() since it adds info to ibm,firmware-versions. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* mem_check(): Correct alignment assumptionsStewart Smith2018-07-161-2/+2
| | | | | | | | | | | | | | Back in the dim dark past, mem_check() was written to take the assumption that mem regions need to be sizeof(alloc_hdr) aligned. I can't see any real reason for this, so change it to sizeof(long) aligned as we count space by number of longs, so at least that kind of makes sense. We hit this assert in a future patch when preserving BOOTKERNEL across fast reboots. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* mem_region: log region name on mem_alloc failureStewart Smith2018-07-161-2/+2
| | | | | | | This can help with debugging when trying to do node local allocations. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Scan PCI and clear memory simultaneouslyStewart Smith2018-07-162-18/+38
| | | | | | | | For many systems, scanning PCI takes about as much time as zeroing all of RAM, so we may as well do them at the same time and cut a few seconds off the total fast reboot time. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* fast-reboot: parallel memory clearingStewart Smith2018-07-162-3/+99
| | | | | | | | | | | | Arbitrarily pick 16GB as the unit of parallelism, and split up clearing memory into jobs and schedule them node-local to the memory (or on node 0 if we can't work that out because it's the memory up to SKIBOOT_BASE) This seems to cut at least ~40% time from memory zeroing on fast-reboot on a 256GB Boston system. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* cpu: add cpu_queue_job_on_node()Nicholas Piggin2018-07-1512-70/+170
| | | | | | | | | | Add a job scheduling API which will run the job on the requested chip_id (or return failure). Includes test harness fixes from Stewart. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* cpu: Ensure no-return flag is updated for current cpu_threadVaibhav Jain2018-07-101-0/+2
| | | | | | | | | | | | | | | | Presently in case a cpu_thread queues a non returning job on itself, the variable cpu_thread.job_has_no_return is never updated and other cpu_threads can still queue a job on it without triggering any warnings. So this patch updates __cpu_queue_job() to ensure that job_has_no_return is updated on the current cpu_thread before it branches to the job->func(). So if the current job is non-returning then other cpu_threads queuing a job on this cpu will trigger a warning. This should aid in debugging some skiboot deadlocks. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core: always flush console before stoppingNicholas Piggin2018-07-042-2/+6
| | | | | | | | This catches a few cases (e.g., fast reboot failure messages) that don't always make it to the console before the machine is rebooted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/cpu: parallelise global CPU register setting jobsNicholas Piggin2018-07-041-10/+37
| | | | | | | | | | | | | | | On a 176 thread system, before: [ 122.319923233,5] OPAL: Switch to big-endian OS [ 126.317897467,5] OPAL: Switch to little-endian OS after: [ 212.439299889,5] OPAL: Switch to big-endian OS [ 212.469323643,5] OPAL: Switch to little-endian OS Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* occ: Move occ declarations into occ.hAndrew Donnellan2018-06-274-0/+4
| | | | | | | | | | | | | OCC declarations are currently split between skiboot.h and occ-sensor.h. Given the growing unwieldyness of skiboot.h it's probably time to move it all into one header. Rename occ-sensor.h to occ.h, move all OCC-related declarations out of skiboot.h, and add #includes as necessary. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* init, occ: Initialise OCC earlier on BMC systemsAndrew Donnellan2018-06-271-1/+13
| | | | | | | | | | | | | | | | | | | | | | We need to use the OCC to obtain presence data for the SXM2 slots on Witherspoon systems. This is needed to determine device type for NVLink GPUs and OpenCAPI devices which can be plugged into the same slot. Support for this will be implemented in a future patch. Currently, OCC initialisation is done just before handing over to Linux, which is well after NPU probe. On FSP systems, OCC boot starts very late, so we wait until the last possible moment to initialise the skiboot side in order to give it the maximum time to boot. On BMC systems, OCC boot starts earlier, so there aren't any issues in moving it earlier in the skiboot init sequence. When running on a BMC machine, call occ_pstates_init() as early as possible in the init sequence. On FSP machines, continue to call it from its current location. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/npu2, core/hmi: Use NPU instead of NPU2 as log message prefixAndrew Donnellan2018-06-271-3/+3
| | | | | | | | | | | | | | | | The NPU2{DBG,INF,ERR} macros use "NPU%d" as a prefix to identify messages relating to a particular NPU. It's slightly confusing to have per-NPU messages prefixed with "NPU0" or "NPU1" and NPU-generic messages prefixed with "NPU2". On some future system we could potentially have a NPU #2 in which case it'd be really confusing. Use NPU rather than NPU2 for NPU-generic log messages. There's no risk of confusion with the original npu.c code since that's only for P8. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* timebase: Remove unused remaining time calculationJoel Stanley2018-06-181-1/+0
| | | | | | | | | | | | In db9c1422002c ("Improve cpu_idle when PM is disabled") the time_wait_poll calculation was modified to calculate the remaining time on each loop. Because of this we don't need to decrement remaining any more. Found by scan-build. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Split debug_descriptor out into own include fileStewart Smith2018-06-183-0/+3
| | | | | | We only touch it in limited places, let's simplify skiboot.h Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Blockchain isn't the only data structure deserving of loveStewart Smith2018-06-181-1/+1
| | | | | | | Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core: Add test for PCI quirksAndrew Jeffery2018-06-183-4/+75
| | | | | | | | | | Ensure that quirks are run (or not) for given PCI vendor and device IDs. This tests the quirk infrastructure and the PCI_VENDOR_ID() and PCI_DEVICE_ID() macros, the latter of which was recently found to be broken. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Quieten console output on bootStewart Smith2018-06-052-3/+3
| | | | | | | | | | | We print out a whole bunch of things on boot, most of which aren't interesting, so we should *not* print them instead. Printing things like what CPUs we found and what PCI devices we found *are* useful, so continue to do that. But we don't need to splat out a bunch of things that are always going to be true. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Display correct chip id while printing NPU FIRs.Mahesh Salgaonkar2018-06-041-4/+4
| | | | | | | | | | | | | | HMIs for NPU xstops are broadcasted to all chips. All cores on all the chips receive HMI. HMI handler correctly identifies and extracts the NPU FIR details from affected chip, but while printing FIR data it prints chip id and location code details of this_cpu()->chip_id which may not be correct. This patch fixes this issue. CC: stable # v6.0+ Fixes: 7bcbc78c ("Add location code to NPU2 HMI logging") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> [stewart: add fixes and cc stable] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* npu2-opencapi: Rework adapter resetFrederic Barrat2018-06-011-6/+6
| | | | | | | | | | | | | Rework a bit the code to reset the opencapi adapter: - make clearer which i2c pin is resetting which device - break the reset operation in smaller chunks. This is really to prepare for a future patch. No functional changes. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* npu2-opencapi: Use presence detectionFrederic Barrat2018-06-011-1/+10
| | | | | | | | | | | | | | | | | | | | | Presence detection is not part of the opencapi specification. So each platform may choose to implement it the way it wants. All current platforms implement it through an i2c device where we can query a pin to know if a device is connected or not. ZZ and Zaius have a similar design and even use the same i2c information and pin numbers. However, presence detection on older ZZ planar (older than v4) doesn't work, so we don't activate it for now, until our lab systems are upgraded and it's better tested. Presence detection on witherspoon is still being worked on. It's shaping up to be quite different, so we may have to revisit the topic in a later patch. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/test/run_mem_region: fix GCC8 compile errorStewart Smith2018-05-293-3/+3
| | | | | | | | error: ‘const’ attribute on function returning ‘void’ [-Werror=attributes] Fix by not putting the const attribute in the stub Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/console: fix deadlock when printing with console lock heldNicholas Piggin2018-05-241-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some debugging options will print while the console lock is held, which is why the console lock is taken as a recursive lock. However console_write calls __flush_console, which will drop and re-take the lock non-recursively in some cases. Just set con_need_flush and return from __flush_console if we are holding the console lock already. This stack usage message (taken with this patch applied) could lead to a deadlock without this: CPU 0000 lowest stack mark 11768 bytes left pc=300cb808 token=0 CPU 0000 Backtrace: S: 0000000031c03370 R: 00000000300cb808 .list_check_node+0x1c S: 0000000031c03410 R: 00000000300cb910 .list_check+0x38 S: 0000000031c034b0 R: 00000000300190ac .try_lock_caller+0xb8 S: 0000000031c03540 R: 00000000300192e0 .lock_caller+0x80 S: 0000000031c03600 R: 0000000030012c70 .__flush_console+0x134 S: 0000000031c036d0 R: 00000000300130cc .console_write+0x68 S: 0000000031c03780 R: 00000000300347bc .vprlog+0xc8 S: 0000000031c03970 R: 0000000030034844 ._prlog+0x50 S: 0000000031c03a00 R: 00000000300364a4 .log_simple_error+0x74 S: 0000000031c03b90 R: 000000003004ab48 .occ_pstates_init+0x184 S: 0000000031c03d50 R: 000000003001480c .load_and_boot_kernel+0x38c S: 0000000031c03e30 R: 000000003001571c .main_cpu_entry+0x62c S: 0000000031c03f00 R: 0000000030002700 boot_entry+0x1c0 Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* cpu: Cleanup clearing of doorbells on P9Benjamin Herrenschmidt2018-05-241-4/+5
| | | | | | | | | | | | | | | | | We currently do a rather pointless msgclr prior to setting in_sleep/in_idle (with no ordering guarantee which isn't great). We also do the final msgsync/msgclr after setting in_sleep/in_idle back to false which while probably ok, isn't that great, we should do msgsync first thing when waking up. Finally, do p9_dbell_receive() before skip_sleep. So take out the first msgclr, swap the final p9_dbell_receive() and add a sync() for good measure and match what p8 does. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* cpu: Use STOP1 on POWER9 for idle/sleep inside OPALBenjamin Herrenschmidt2018-05-241-4/+4
| | | | | | | | | | | | The current code requests STOP3, which means it gets STOP2 in practice. STOP2 has proven to occasionally be unreliable depending on FW version and chip revision, it also requires a functional CME, so instead, let's use STOP1. The difference is rather minimum for something that is only used a few seconds during boot. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
OpenPOWER on IntegriCloud