| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
No point doing that from init on the main CPU while it's
done already inside cpu.c for secondaries.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Fixes: 9567e18728d0559bc5f79ea927d684dc3b1e3555
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split cpu_idle() into cpu_idle_delay() and cpu_idle_job() rather than
requesting the idle type as a function argument. Have those functions
provide a default polling (non-PM) implentation which spin at the
lowest SMT priority.
This moves all the decrementer delay code into the CPU idle code rather
than the caller.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent CPUs have introduced a lower SMT priority. This uses the
Linux pattern of executing priority nops in descending order to
get a simple portable way to put the CPU into lowest SMT priority.
Introduce smt_lowest() and use it in place of smt_very_low and
smt_low ; smt_very_low sequences.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Some code path want to look at all the CPUs that are "present",
which means they have been enabled by HB/Cronus and can be accessed
via XSCOMs, even if they haven't called in yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nest mmu address scom was hardcoded which could lead to boot
failure on POWER9 systems without a nest mmu. For example Mambo
doesn't model the nest mmu which results in the following failure when
calling opal_nmmu_set_ptcr() during kernel load:
WARNING: 20856759: (20856757): Invalid address 0x0000000028096258 in XSCOM range, SCOM=0x00280962b
WARNING: 20856759: (20856757): Attempt to store non-existent address 0x00001A0028096258
20856759: (20856757): 0x000000003002DA08 : stdcix r26,r0,r3
FATAL ERROR: 20856759: (20856757): Check Stop for 0:0: Machine Check with ME bit of MSR off
This patch instead reads the address from the device-tree and makes
opal_nmmu_set_ptcr return OPAL_UNSUPPORTED on systems without a nest
mmu.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the event we only have 1 CPU thread, we run asynchronous jobs
synchronously, and while we wait for them to finish, we run pollers.
However, if the jobs themselves don't call pollers (e.g. by time_wait())
then we'll end up in long periods of not running pollers at all.
To work around this, explicitly run pollers when we're the only
CPU thread (i.e. when we run the job synchronously).
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
We don't support it yet on others.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
POWER9 has an off core MMU called the Nest MMU which allows other
units within a chip to perform address translations. The context and
setup for translations is handled by the requesting agents, however
the Nest MMU does need to know where in system memory the page tables
are located.
This patch adds a call to setup the Nest MMU page table pointer on a
per-chip basis.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to send threads to nap mode when either idle (waiting
for a job) or when in a sleep delay (time_wait*).
We only enable the functionality after the 0x100 vector has been
patched, and we disable it before transferring control to Linux.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be handled by time_wait_ms(). Also remove a useless
smt_medium().
Note that this introduce a difference in behaviour: time_wait
will only call the pollers on the boot CPU while cpu_wait_job()
could call them on any. However, I can't think of a case where
this is a problem.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead, target a specific CPU for a global job at queuing time.
This will allow us to wake up the target using an interrupt when
implementing nap mode.
The algorithm used is to look for idle primary threads first, then
idle secondaries, and finally the less loaded thread. If nothing can
be found, we fallback to a synchronous call.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
For now a simple generic implementation using cpu_relax()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Wrapper around list_empty_nocheck() to see if there's any
job pending on a given CPU
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
It doesn't work well to call it before the boot CPU structure
is initialized.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Otherwise we might trigger an assertion when list debug is enabled
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The checks validate pointers sent in using
opal_addr_valid() in opal_call API's provided
via the console, cpu, fdt, flash, i2c, interrupts,
nvram, opal-msg, opal, opal-pci, xscom and
cec modules
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This allows opal_addr_valid to perform the < top_of_mem check and pass
for early xscom_reads that read into cpu stack.
Arguably the more correct solution is to split opal_xscom_read from
xscom_read and only validate for the OPAL call.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Below are the in-memory console log messages observed with error level(PR_ERROR)
[54460318,3] HBRT: Mem region 'ibm,homer-image' not found !
[54465404,3] HBRT: Mem region 'ibm,homer-image' not found !
[54470372,3] HBRT: Mem region 'ibm,homer-image' not found !
[54475369,3] HBRT: Mem region 'ibm,homer-image' not found !
[11540917382,3] NVRAM: Layout appears sane
[11694529822,3] OPAL: Trying a CPU re-init with flags: 0x2
[61291003267,3] OPAL: Trying a CPU re-init with flags: 0x1
[61394005956,3] OPAL: Trying a CPU re-init with flags: 0x2
Lowering the log level of mem region not found messages to PR_WARNING and remaining messages to PR_INFO level
[54811683,4] HBRT: Mem region 'ibm,homer-image' not found !
[10923382751,6] NVRAM: Layout appears sane
[55533988976,6] OPAL: Trying a CPU re-init with flags: 0x1
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
On P8 we got it a bit wrong and would fall into a workaround for P8 DD1
HILE setting if other bits were set in the flags to OPAL_REINIT_CPUS,
limiting our opportunity to extend it in the future.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Also make the locking around re-init safer, properly block the
OS from restarting a thread that was caught for re-init.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ISAv3 adds a mode to increase the size of the decrementer from 32 bits.
The enlarged decrementer can be between 32 and 64 bits wide with the
exact value being implementation dependent. This patch adds support for
detecting the size of the large decrementer and populating each CPU node
with the "ibm,dec-bits" property.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[stewart@linux.vnet.ibm.com: rename enable_ld() to enable_large_dec()]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3ff35034 (Abstract HILE and attn enable bit definitions for HID0)
enabled HILE bit instead of ATTN bit in enable/disable_attn fuction.
Hence OPAL assert is failing.
Fixes: 3ff35034 (Abstract HILE and attn enable bit definitions for HID0)
CC: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the PHB reset and PCI enumeration are done concurrently
on multiple CPU cores. The output messages are interleaved and not
readable enough. This adds a option to do the jobs in serialized
fashion for debugging purpose only. The serialized mode should be
always disabled in field.
Suggested-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Add PVR detection, chip id and other misc bits for POWER9.
POWER9 changes the location of the HILE and attn enable bits in the
HID0 register, so add these definitions also.
Signed-off-by: Michael Neuling <mikey@neuling.org>
[stewart@linux.vnet.ibm.com: Fix Numbus typo, hdata_to_dt build fixes]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Abstract HILE and attn enable bits definitions for HID0 in case these
locations randomly change in future chip revisions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we don't touch the attn enable bit in HID0 on boot.
When attn is enabled, it's available everywhere including HV=0. This
is very dangerous for the host kernel.
This explicitly disables the attn instruction on all CPUs on boot.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This changes trigger_attn() to also enable attn via HID0, so callers
don't have to do it themselves.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
get_available_nr_cores_in_chip() takes 'chip_id' as an argument and
returns the number of available cores in the chip.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Fixes: 55ae15b
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In root causing a bug on AST BMC Alistair found that pollers weren't
being run for around 3800ms.
This was due to a wonderful accident that's probably about a year or more
old where:
In cpu_wait_job we have:
unsigned long ticks = usecs_to_tb(5);
...
time_wait(ticks);
While in time_wait(), deciding on if to run pollers:
unsigned long period = msecs_to_tb(5);
...
if (remaining >= period) {
Obviously, this means we never run pollers. Not ideal.
This patch ensures we run pollers every 5ms in cpu_wait_job() as well
as displaying how long we waited for a job if that wait was >1second.
Reported-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Fix potential NULL dereference of pointer returned
from dt_find_property() in cpu_remove_node().
Fixes coverity defect#97849.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 87690bd19dbb introduced a label "again" so as to avoid holding
the lock while waiting. But this leads to hang in scenarios like kdump,
where 'cpu_state_os' will be the state for all offline cpus. Actaully,
the wait loop doesn't really take off with the goto statement in it.
This patch tries to fix this problem by removing the goto statement
and unlocking/locking within the wait loop instead.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| | |
Use-after-free bug.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This modifies code output when built with GCOV so that the store of counter
data is out of some of the loops, thus improving things when built with gcov.
Also, likely makes us do even less work when relaxing, so probably a good
thing.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With SKIBOOT_GCOV=1 and multiple asm blocks for nop instruction, we end
up with this:
1a5f8: 60 00 00 00 nop
1a5fc: 60 00 00 00 nop
1a600: 60 00 00 00 nop
1a604: 60 00 00 00 nop
1a608: 3d 42 00 0e addis r10,r2,14
1a60c: e9 2a 20 f8 ld r9,8440(r10)
1a610: 39 29 00 01 addi r9,r9,1
1a614: f9 2a 20 f8 std r9,8440(r10)
1a618: 60 00 00 00 nop
1a61c: 60 00 00 00 nop
1a620: 60 00 00 00 nop
1a624: 60 00 00 00 nop
Which is not the desired code output for relaxing the cpu.
By batching up the instructions into one asm block, we get the
desired effect: just one set of mcount update
|
| |
| |
| |
| |
| |
| | |
i.e. currently only mambo
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we have multiple systems trying to start concurrent jobs on different
CPUs, they typically pick the first available (operating) CPU to schedule
the job on. This works fine when there's only one set of jobs or when we
want to bind jobs to specific CPUs.
When we have jobs such as asynchronously loading LIDs and scanning PHBs,
we don't care which CPUs they run on, we care more that they are not
scheduled on CPUs that have existing tasks.
This patch adds a global queue of jobs which secondary CPUs will look
at for work (if idle).
This leads to simplified callers, which just need to queue jobs to NULL
(no specific CPU) and then call a magic function that will run the
CPU job queue if we don't have secondary CPUs.
Additionally, we add a const char *name to cpu_job just to aid with
debugging.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In skiboot, CPU stacks are indexed by PIR.
During boot, we have two ideas about what the actual maximum PIR is:
1) detect CPU type (P7 or P8): we know max PIR is max for that proc
(e.g. 1024, 8192)
2) start all CPUs (go through device tree for CPUs that exist).
We now know the *actual* CPUs we have and the max PIR.
e.g 1, 64, 3319 or whatever
Each CPU stack is 16KB.
So max CPU stacks size for P7 is 16MB, for P8 is 128MB.
The *actual* max for the machine we're booting on is based on max PIR
we detect during boot. I have found the following:
Mambo: 16kb max (one CPU)
P7: 64, meaning 64*16k = 1MB
P8: 3320, meaning 3320*16k = 51MB
So, currently, we were not reseting the size of the skiboot_cpu_stacks
memory region correctly before boot (we construct that part of the device
tree as the very last thing before booting the payload), even though the
comment in mem_region.c would suggest we were, we weren't. Because code
comments are evil and are nothing but filty, filthy lies.
With this patch, we now properly adjust the CPU stacks memory region
size after we've detected CPU type and after we've found the real
max PIR.
This saves between about 77MB and 128MB-16kB of memory from being in a
reserved region and it'll now be available to the OS to use for things
such as cat pictures rather than being firmware stack space waiting for
a CPU that will never appear.
You can see the difference in skiboot log, "Reserved regions:":
Before:
ALL: 0x000031a00000..0000399fffff : ibm,firmware-stacks
AFTER:
Mambo: 0x000031a00000..000031a1ffff : ibm,firmware-stacks
P7: 0x000031a00000..000031afffff : ibm,firmware-stacks.
P8: 0x000031a00000..000034ddffff : ibm,firmware-stacks
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|/
|
|
|
|
|
|
|
|
| |
This adds the PVR and CFAM ID for the Naples chip. Otherwise treated as
a Venice.
This doesn't add the definitions for the new PHB revision yet
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Now that opal.h includes opal-api.h, there are a bunch of files that
include both but don't need to.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of split core, some of the Timer facility errors needs cleanup to be
done before we proceed with the error recovery.
Certain TB/HDEC errors leaves dirty data in timebase and HDEC registers,
which need to cleared before we initiate clear_tb_errors through TFMR[24].
The cleanup has to be done by any one thread from core or subcore.
In split core mode, it is required to clear the dirty data from TB/HDEC
register by all subcores (active partitions) before we clear tb errors
through TFMR[24]. The HMI recovery would fail even if one subcore do
not cleanup the respective TB/HDEC register. Dirty data can be cleaned by
writing zero's to TB/HDEC register.
For un-split core, any one thread can do the cleanup.
For split core, any one thread from each subcore can do the cleanup.
Errors that required pre-recovery cleanup:
- SPR_TFMR_TB_RESIDUE_ERR
- SPR_TFMR_HDEC_PARITY_ERROR
This patch implements pre-recovery steps to clean dirty data from TB/HDEC
register for above mentioned timer facility errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|