summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/44x/fsp2: Add irq error handlersIvan Mikhaylov2018-03-021-1/+204
| | | | | | | | | | Add irq error handlers for cmu, plb, opb, mcue, conf with debug information output in case of problems. Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> (cherry picked from commit 7813043e1bbcea764c86ef2fcf43aeae2b0e240d) Signed-off-by: Joel Stanley <joel@jms.id.au>
* powerpc/44x/fsp2: tvsense workaround for dd1Ivan Mikhaylov2018-03-021-0/+17
| | | | | | | | | | | | | | | | TVSENSE(temperature and voltage sensors) reset is blocked (clock gated) by the POR default of the TVS sleep config bit. As a consequence, TVSENSE will provide erratic sensor values, which may result in spurious (parity) errors recorded in the CMU FIR and leading to erroneous interrupt requests once the CMU interrupt is unmasked. Purpose of this to set up CMU in working state in any cases even in case of parity errors. Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> (cherry picked from commit 50f01c57d700cce04a9dc87bc55db2e283f57212) Signed-off-by: Joel Stanley <joel@jms.id.au>
* powerpc/44x/fsp2: Interrupt handling setupIvan Mikhaylov2018-03-021-0/+37
| | | | | | | | | | | | | | | * clear out any possible plb6 errors * board interrupt handling setup within l2 reg set * fsp2 parity error setup All those points are needed for correct interrupt handling on board level including error handling report. Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> (cherry picked from commit 9c4c374676a12db4a452534f3347323d35c32d1a) Signed-off-by: Joel Stanley <joel@jms.id.au>
* powerpc/44x/fsp2: Add fsp2 headersIvan Mikhaylov2018-03-021-0/+272
| | | | | | | | | | | Add cmu, plbX, l2, ddr3/4, crcs register definitions. Add mfcmu, mtcmu functions for indirect access to cmu. Add mtl2, mfl2 same for l2 cache core reg set. Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> (cherry picked from commit 494d82ceae7f3b9fdb5154b4469e5bb22f56040b) Signed-off-by: Joel Stanley <joel@jms.id.au>
* powerpc/44x/fsp2: Enable eMMC arasan for fsp2 platformIvan Mikhaylov2017-12-072-14/+21
| | | | | | | | | | Add mmc0 changes for enabling arasan emmc and change defconfig appropriately. Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> (cherry picked from commit 754f030908c3615781e9e3559d8ba1500cf82215) Signed-off-by: Joel Stanley <joel@jms.id.au>
* KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updatesPaul Mackerras2017-11-152-10/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 38c53af853069adf87181684370d7b8866d6387b upstream. Commit 5e9859699aba ("KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation", 2016-12-20) added code that tries to exclude any use or update of the hashed page table (HPT) while the HPT resizing code is iterating through all the entries in the HPT. It does this by taking the kvm->lock mutex, clearing the kvm->arch.hpte_setup_done flag and then sending an IPI to all CPUs in the host. The idea is that any VCPU task that tries to enter the guest will see that the hpte_setup_done flag is clear and therefore call kvmppc_hv_setup_htab_rma, which also takes the kvm->lock mutex and will therefore block until we release kvm->lock. However, any VCPU that is already in the guest, or is handling a hypervisor page fault or hypercall, can re-enter the guest without rechecking the hpte_setup_done flag. The IPI will cause a guest exit of any VCPUs that are currently in the guest, but does not prevent those VCPU tasks from immediately re-entering the guest. The result is that after resize_hpt_rehash_hpte() has made a HPTE absent, a hypervisor page fault can occur and make that HPTE present again. This includes updating the rmap array for the guest real page, meaning that we now have a pointer in the rmap array which connects with pointers in the old rev array but not the new rev array. In fact, if the HPT is being reduced in size, the pointer in the rmap array could point outside the bounds of the new rev array. If that happens, we can get a host crash later on such as this one: [91652.628516] Unable to handle kernel paging request for data at address 0xd0000000157fb10c [91652.628668] Faulting instruction address: 0xc0000000000e2640 [91652.628736] Oops: Kernel access of bad area, sig: 11 [#1] [91652.628789] LE SMP NR_CPUS=1024 NUMA PowerNV [91652.628847] Modules linked in: binfmt_misc vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas i2c_opal ipmi_powernv ipmi_devintf i2c_core ipmi_msghandler powernv_op_panel nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm_hv kvm_pr kvm scsi_dh_alua dm_service_time dm_multipath tg3 ptp pps_core [last unloaded: stap_552b612747aec2da355051e464fa72a1_14259] [91652.629566] CPU: 136 PID: 41315 Comm: CPU 21/KVM Tainted: G O 4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le #1 [91652.629684] task: c0000007a419e400 task.stack: c0000000028d8000 [91652.629750] NIP: c0000000000e2640 LR: d00000000c36e498 CTR: c0000000000e25f0 [91652.629829] REGS: c0000000028db5d0 TRAP: 0300 Tainted: G O (4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le) [91652.629932] MSR: 900000010280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 44022422 XER: 00000000 [91652.630034] CFAR: d00000000c373f84 DAR: d0000000157fb10c DSISR: 40000000 SOFTE: 1 [91652.630034] GPR00: d00000000c36e498 c0000000028db850 c000000001403900 c0000007b7960000 [91652.630034] GPR04: d0000000117fb100 d000000007ab00d8 000000000033bb10 0000000000000000 [91652.630034] GPR08: fffffffffffffe7f 801001810073bb10 d00000000e440000 d00000000c373f70 [91652.630034] GPR12: c0000000000e25f0 c00000000fdb9400 f000000003b24680 0000000000000000 [91652.630034] GPR16: 00000000000004fb 00007ff7081a0000 00000000000ec91a 000000000033bb10 [91652.630034] GPR20: 0000000000010000 00000000001b1190 0000000000000001 0000000000010000 [91652.630034] GPR24: c0000007b7ab8038 d0000000117fb100 0000000ec91a1190 c000001e6a000000 [91652.630034] GPR28: 00000000033bb100 000000000073bb10 c0000007b7960000 d0000000157fb100 [91652.630735] NIP [c0000000000e2640] kvmppc_add_revmap_chain+0x50/0x120 [91652.630806] LR [d00000000c36e498] kvmppc_book3s_hv_page_fault+0xbb8/0xc40 [kvm_hv] [91652.630884] Call Trace: [91652.630913] [c0000000028db850] [c0000000028db8b0] 0xc0000000028db8b0 (unreliable) [91652.630996] [c0000000028db8b0] [d00000000c36e498] kvmppc_book3s_hv_page_fault+0xbb8/0xc40 [kvm_hv] [91652.631091] [c0000000028db9e0] [d00000000c36a078] kvmppc_vcpu_run_hv+0xdf8/0x1300 [kvm_hv] [91652.631179] [c0000000028dbb30] [d00000000c2248c4] kvmppc_vcpu_run+0x34/0x50 [kvm] [91652.631266] [c0000000028dbb50] [d00000000c220d54] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm] [91652.631351] [c0000000028dbbd0] [d00000000c2139d8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm] [91652.631433] [c0000000028dbd40] [c0000000003832e0] do_vfs_ioctl+0xd0/0x8c0 [91652.631501] [c0000000028dbde0] [c000000000383ba4] SyS_ioctl+0xd4/0x130 [91652.631569] [c0000000028dbe30] [c00000000000b8e0] system_call+0x58/0x6c [91652.631635] Instruction dump: [91652.631676] fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ffa1 2fa70000 793d0020 e9432110 [91652.631814] 7bbf26e4 7c7e1b78 7feafa14 409e0094 <807f000c> 786326e4 7c6a1a14 93a40008 [91652.631959] ---[ end trace ac85ba6db72e5b2e ]--- To fix this, we tighten up the way that the hpte_setup_done flag is checked to ensure that it does provide the guarantee that the resizing code needs. In kvmppc_run_core(), we check the hpte_setup_done flag after disabling interrupts and refuse to enter the guest if it is clear (for a HPT guest). The code that checks hpte_setup_done and calls kvmppc_hv_setup_htab_rma() is moved from kvmppc_vcpu_run_hv() to a point inside the main loop in kvmppc_run_vcpu(), ensuring that we don't just spin endlessly calling kvmppc_run_core() while hpte_setup_done is clear, but instead have a chance to block on the kvm->lock mutex. Finally we also check hpte_setup_done inside the region in kvmppc_book3s_hv_page_fault() where the HPTE is locked and we are about to update the HPTE, and bail out if it is clear. If another CPU is inside kvm_vm_ioctl_resize_hpt_commit) and has cleared hpte_setup_done, then we know that either we are looking at a HPTE that resize_hpt_rehash_hpte() has not yet processed, which is OK, or else we will see hpte_setup_done clear and refuse to update it, because of the full barrier formed by the unlock of the HPTE in resize_hpt_rehash_hpte() combined with the locking of the HPTE in kvmppc_book3s_hv_page_fault(). Fixes: 5e9859699aba ("KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation") Reported-by: Satheesh Rajendran <satheera@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/kprobes: Dereference function pointers only if the address does not ↵Naveen N. Rao2017-11-081-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | belong to kernel text commit e6c4dcb308160115287afd87afb63b5684d75a5b upstream. This makes the changes introduced in commit 83e840c770f2c5 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") to be specific to the kprobe subsystem. We previously changed ppc_function_entry() to always check the provided address to confirm if it needed to be dereferenced. This is actually only an issue for kprobe blacklisted asm labels (through use of _ASM_NOKPROBE_SYMBOL) and can cause other issues with ftrace. Also, the additional checks are not really necessary for our other uses. As such, move this check to the kprobes subsystem. Fixes: 83e840c770f2 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "powerpc64/elfv1: Only dereference function descriptor for non-text ↵Naveen N. Rao2017-11-081-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | symbols" commit 63be1a81e40733ecd175713b6a7558dc43f00851 upstream. This reverts commit 83e840c770f2c5 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols"). Chandan reported that on newer kernels, trying to enable function_graph tracer on ppc64 (BE) locks up the system with the following trace: Unable to handle kernel paging request for data at address 0x600000002fa30010 Faulting instruction address: 0xc0000000001f1300 Thread overran stack, or stack corrupted Oops: Kernel access of bad area, sig: 11 [#1] BE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries Modules linked in: CPU: 1 PID: 6586 Comm: bash Not tainted 4.14.0-rc3-00162-g6e51f1f-dirty #20 task: c000000625c07200 task.stack: c000000625c07310 NIP: c0000000001f1300 LR: c000000000121cac CTR: c000000000061af8 REGS: c000000625c088c0 TRAP: 0380 Not tainted (4.14.0-rc3-00162-g6e51f1f-dirty) MSR: 8000000000001032 <SF,ME,IR,DR,RI> CR: 28002848 XER: 00000000 CFAR: c0000000001f1320 SOFTE: 0 ... NIP [c0000000001f1300] .__is_insn_slot_addr+0x30/0x90 LR [c000000000121cac] .kernel_text_address+0x18c/0x1c0 Call Trace: [c000000625c08b40] [c0000000001bd040] .is_module_text_address+0x20/0x40 (unreliable) [c000000625c08bc0] [c000000000121cac] .kernel_text_address+0x18c/0x1c0 [c000000625c08c50] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c08cf0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c08d60] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c08df0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 ... [c000000625c0ab30] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0abd0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ac40] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c0acd0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0ad70] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ade0] [c000000000121b40] .kernel_text_address+0x20/0x1c0 This is because ftrace is using ppc_function_entry() for obtaining the address of return_to_handler() in prepare_ftrace_return(). The call to kernel_text_address() itself gets traced and we end up in a recursive loop. Fixes: 83e840c770f2 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()Cédric Le Goater2017-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a9dadc1c512807f955f0799e85830b420da47932 upstream. When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. This can result in a WARN_ON such as: WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0 ... NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0 LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0 Call Trace: xive_find_target_in_mask+0x74/0x2f0 (unreliable) xive_pick_irq_target.isra.1+0x200/0x230 xive_irq_startup+0x60/0x180 irq_startup+0x70/0xd0 __setup_irq+0x7bc/0x880 request_threaded_irq+0x14c/0x2c0 request_event_sources_irqs+0x100/0x180 __machine_initcall_pseries_init_ras_IRQ+0x104/0x134 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x290/0x374 kernel_init+0x24/0x170 ret_from_kernel_thread+0x5c/0x74 This happens because we're being called with our affinity mask set to irq_default_affinity. That in turn was populated using cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a mask which has > nr_cpu_ids bits set. Fix it by limiting the value returned by cpumask_weight(). Signed-off-by: Cédric Le Goater <clg@kaod.org> [mpe: Add change log details on actual cause] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S: Protect kvmppc_gpa_to_ua() with SRCUAlexey Kardashevskiy2017-11-021-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8f6a9f0d0604817f7c8d4376fd51718f1bf192ee upstream. kvmppc_gpa_to_ua() accesses KVM memory slot array via srcu_dereference_check() and this produces warnings from RCU like below. This extends the existing srcu_read_lock/unlock to cover that kvmppc_gpa_to_ua() as well. We did not hit this before as this lock is not needed for the realmode handlers and hash guests would use the realmode path all the time; however the radix guests are always redirected to the virtual mode handlers and hence the warning. [ 68.253798] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage! [ 68.253799] other info that might help us debug this: [ 68.253802] rcu_scheduler_active = 2, debug_locks = 1 [ 68.253804] 1 lock held by qemu-system-ppc/6413: [ 68.253806] #0: (&vcpu->mutex){+.+.}, at: [<c00800000e3c22f4>] vcpu_load+0x3c/0xc0 [kvm] [ 68.253826] stack backtrace: [ 68.253830] CPU: 92 PID: 6413 Comm: qemu-system-ppc Tainted: G W 4.14.0-rc3-00553-g432dcba58e9c-dirty #72 [ 68.253833] Call Trace: [ 68.253839] [c000000fd3d9f790] [c000000000b7fcc8] dump_stack+0xe8/0x160 (unreliable) [ 68.253845] [c000000fd3d9f7d0] [c0000000001924c0] lockdep_rcu_suspicious+0x110/0x180 [ 68.253851] [c000000fd3d9f850] [c0000000000e825c] kvmppc_gpa_to_ua+0x26c/0x2b0 [ 68.253858] [c000000fd3d9f8b0] [c00800000e3e1984] kvmppc_h_put_tce+0x12c/0x2a0 [kvm] Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: POWER9 more doorbell fixesNicholas Piggin2017-11-021-0/+5
| | | | | | | | | | | | | | | | | commit 2cde3716321ec64a1faeaf567bd94100c7b4160f upstream. - Add another case where msgsync is required. - Required barrier sequence for global doorbells is msgsync ; lwsync When msgsnd is used for IPIs to other cores, msgsync must be executed by the target to order stores performed on the source before its msgsnd (provided the source executes the appropriate sync). Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTMGreg Kurz2017-11-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ac64115a66c18c01745bbd3c47a36b124e5fd8c0 upstream. The following program causes a kernel oops: #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> main() { int fd = open("/dev/kvm", O_RDWR); ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM); } This happens because when using the global KVM fd with KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets called with a NULL kvm argument, which gets dereferenced in is_kvmppc_hv_enabled(). Spotted while reading the code. Let's use the hv_enabled fallback variable, like everywhere else in this function. Fixes: 23528bb21ee2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()Sam Bobroff2017-10-122-4/+2
| | | | | | | | | | | | | | | | | | | | | | commit 2fb1e946450a4fef74bb72f360555f7760d816f0 upstream. In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the value of state->guest_server as "server". However, this value is not set by it's counterpart kvmppc_xive_set_xive(). When the guest uses this interface to migrate interrupts away from a CPU that is going offline, it sees all interrupts as belonging to CPU 0, so they are left assigned to (now) offline CPUs. This patch removes the guest_server field from the state, and returns act_server in it's place (that is, the CPU actually handling the interrupt, which may differ from the one requested). Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/tm: Fix illegal TM state in signal handlerGustavo Romero2017-10-121-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 044215d145a7a8a60ffa8fdc859d110a795fa6ea upstream. Currently it's possible that on returning from the signal handler through the restore_tm_sigcontexts() code path (e.g. from a signal caught due to a `trap` instruction executed in the middle of an HTM block, or a deliberately constructed sigframe) an illegal TM state (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets implicitly the MSR register from SRR1 register on return to userspace it causes a TM Bad Thing exception. That illegal state can be set (a) by a malicious user that disables the TM bit by tweaking the bits in uc_mcontext before returning from the signal handler or (b) by a sufficient number of context switches occurring such that the load_tm counter overflows and TM is disabled whilst in the signal handler. This commit fixes the illegal TM state by ensuring that TM bit is always enabled before we return from restore_tm_sigcontexts(). A small comment correction is made as well. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/64s: Use emergency stack for kernel TM Bad Thing program checksCyril Bur2017-10-121-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 265e60a170d0a0ecfc2d20490134ed2c48dd45ab upstream. When using transactional memory (TM), the CPU can be in one of six states as far as TM is concerned, encoded in the Machine State Register (MSR). Certain state transitions are illegal and if attempted trigger a "TM Bad Thing" type program check exception. If we ever hit one of these exceptions it's treated as a bug, ie. we oops, and kill the process and/or panic, depending on configuration. One case where we can trigger a TM Bad Thing, is when returning to userspace after a system call or interrupt, using RFID. When this happens the CPU first restores the user register state, in particular r1 (the stack pointer) and then attempts to update the MSR. However the MSR update is not allowed and so we take the program check with the user register state, but the kernel MSR. This tricks the exception entry code into thinking we have a bad kernel stack pointer, because the MSR says we're coming from the kernel, but r1 is pointing to userspace. To avoid this we instead always switch to the emergency stack if we take a TM Bad Thing from the kernel. That way none of the user register values are used, other than for printing in the oops message. This is the fix for CVE-2017-1000255. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Rewrite change log & comments, tweak asm slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix action argument for cpufeatures-based TLB flushJeremy Kerr2017-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | commit 3b7af5c0fd9631762d1c4d7b4cee76f571dd3c2c upstream. Commit 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the cpufeatures code, specifying the number of sets to flush. However, these functions take an action argument, not a number of sets. This means we hit the BUG() in __flush_tlb_{206,300} when using cpufeatures-style configuration. This change passes TLB_INVAL_SCOPE_GLOBAL instead. Fixes: 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/powernv: Increase memory block size to 1GB on radixAnton Blanchard2017-10-121-1/+9
| | | | | | | | | | | | | | | | | | | | commit 53ecde0b9126ff140abe3aefd7f0ec64d6fa36b0 upstream. Memory hot unplug on PowerNV radix hosts is broken. Our memory block size is 256MB but since we map the linear region with very large pages, each pte we tear down maps 1GB. A hot unplug of one 256MB memory block results in 768MB of memory getting unintentionally unmapped. At this point we are likely to oops. Fix this by increasing our memory block size to 1GB on PowerNV radix hosts. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero2017-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c1fa0768a8713b135848f78fd43ffc208d8ded70 upstream. Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/pseries: Fix parent_dn reference leak in add_dt_node()Tyrel Datwyler2017-10-051-1/+3
| | | | | | | | | | | | | | | | | commit b537ca6fede69a281dc524983e5e633d79a10a08 upstream. A reference to the parent device node is held by add_dt_node() for the node to be added. If the call to dlpar_configure_connector() fails add_dt_node() returns ENOENT and that reference is not freed. Add a call to of_node_put(parent_dn) prior to bailing out after a failed dlpar_configure_connector() call. Fixes: 8d5ff320766f ("powerpc/pseries: Make dlpar_configure_connector parent node aware") Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/eeh: Create PHB PEs after EEH is initializedBenjamin Herrenschmidt2017-10-052-18/+4
| | | | | | | | | | | | | | | commit 3e77adeea3c5393c9b624832f65441e92867f618 upstream. Otherwise we end up not yet having computed the right diag data size on powernv where EEH initialization is delayed, thus causing memory corruption later on when calling OPAL. Fixes: 5cb1f8fdddb7 ("powerpc/powernv/pci: Dynamically allocate PHB diag data") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Don't access XIVE PIPR register using byte accessesBenjamin Herrenschmidt2017-10-053-5/+4
| | | | | | | | | | | | | | | | commit d222af072380c4470295c07d84ecb15f4937e365 upstream. The XIVE interrupt controller on POWER9 machines doesn't support byte accesses to any register in the thread management area other than the CPPR (current processor priority register). In particular, when reading the PIPR (pending interrupt priority register), we need to do a 32-bit or 64-bit load. Fixes: 2c4fb78f78b6 ("KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Fix bug causing host SLB to be restored incorrectlyPaul Mackerras2017-10-051-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 67f8a8c1151c9ef3d1285905d1e66ebb769ecdf7 upstream. Aneesh Kumar reported seeing host crashes when running recent kernels on POWER8. The symptom was an oops like this: Unable to handle kernel paging request for data at address 0xf00000000786c620 Faulting instruction address: 0xc00000000030e1e4 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: powernv_op_panel CPU: 24 PID: 6663 Comm: qemu-system-ppc Tainted: G W 4.13.0-rc7-43932-gfc36c59 #2 task: c000000fdeadfe80 task.stack: c000000fdeb68000 NIP: c00000000030e1e4 LR: c00000000030de6c CTR: c000000000103620 REGS: c000000fdeb6b450 TRAP: 0300 Tainted: G W (4.13.0-rc7-43932-gfc36c59) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24044428 XER: 20000000 CFAR: c00000000030e134 DAR: f00000000786c620 DSISR: 40000000 SOFTE: 0 GPR00: 0000000000000000 c000000fdeb6b6d0 c0000000010bd000 000000000000e1b0 GPR04: c00000000115e168 c000001fffa6e4b0 c00000000115d000 c000001e1b180386 GPR08: f000000000000000 c000000f9a8913e0 f00000000786c600 00007fff587d0000 GPR12: c000000fdeb68000 c00000000fb0f000 0000000000000001 00007fff587cffff GPR16: 0000000000000000 c000000000000000 00000000003fffff c000000fdebfe1f8 GPR20: 0000000000000004 c000000fdeb6b8a8 0000000000000001 0008000000000040 GPR24: 07000000000000c0 00007fff587cffff c000000fdec20bf8 00007fff587d0000 GPR28: c000000fdeca9ac0 00007fff587d0000 00007fff587c0000 00007fff587d0000 NIP [c00000000030e1e4] __get_user_pages_fast+0x434/0x1070 LR [c00000000030de6c] __get_user_pages_fast+0xbc/0x1070 Call Trace: [c000000fdeb6b6d0] [c00000000139dab8] lock_classes+0x0/0x35fe50 (unreliable) [c000000fdeb6b7e0] [c00000000030ef38] get_user_pages_fast+0xf8/0x120 [c000000fdeb6b830] [c000000000112318] kvmppc_book3s_hv_page_fault+0x308/0xf30 [c000000fdeb6b960] [c00000000010e10c] kvmppc_vcpu_run_hv+0xfdc/0x1f00 [c000000fdeb6bb20] [c0000000000e915c] kvmppc_vcpu_run+0x2c/0x40 [c000000fdeb6bb40] [c0000000000e5650] kvm_arch_vcpu_ioctl_run+0x110/0x300 [c000000fdeb6bbe0] [c0000000000d6468] kvm_vcpu_ioctl+0x528/0x900 [c000000fdeb6bd40] [c0000000003bc04c] do_vfs_ioctl+0xcc/0x950 [c000000fdeb6bde0] [c0000000003bc930] SyS_ioctl+0x60/0x100 [c000000fdeb6be30] [c00000000000b96c] system_call+0x58/0x6c Instruction dump: 7ca81a14 2fa50000 41de0010 7cc8182a 68c60002 78c6ffe2 0b060000 3cc2000a 794a3664 390610d8 e9080000 7d485214 <e90a0020> 7d435378 790507e1 408202f0 ---[ end trace fad4a342d0414aa2 ]--- It turns out that what has happened is that the SLB entry for the vmmemap region hasn't been reloaded on exit from a guest, and it has the wrong page size. Then, when the host next accesses the vmemmap region, it gets a page fault. Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM", 2017-07-24) modified the guest exit code so that it now only clears out the SLB for hash guest. The code tests the radix flag and puts the result in a non-volatile CR field, CR2, and later branches based on CR2. Unfortunately, the kvmppc_save_tm function, which gets called between those two points, modifies all the user-visible registers in the case where the guest was in transactional or suspended state, except for a few which it restores (namely r1, r2, r9 and r13). Thus the hash/radix indication in CR2 gets corrupted. This fixes the problem by re-doing the comparison just before the result is needed. For good measure, this also adds comments next to the call sites of kvmppc_save_tm and kvmppc_restore_tm pointing out that non-volatile register state will be lost. Fixes: a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM") Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Hold kvm->lock around call to kvmppc_update_lpcrPaul Mackerras2017-10-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | commit cf5f6f3125241853462334b1bc696f3c3c492178 upstream. Commit 468808bd35c4 ("KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9", 2017-01-30) added a call to kvmppc_update_lpcr() which doesn't hold the kvm->lock mutex around the call, as required. This adds the lock/unlock pair, and for good measure, includes the kvmppc_setup_partition_table() call in the locked region, since it is altering global state of the VM. This error appears not to have any fatal consequences for the host; the consequences would be that the VCPUs could end up running with different LPCR values, or an update to the LPCR value by userspace using the one_reg interface could get overwritten, or the update done by kvmhv_configure_mmu() could get overwritten. Fixes: 468808bd35c4 ("KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix DAR reporting when alignment handler faultsMichael Ellerman2017-09-271-45/+74
| | | | | | | | | | | | | | | | | | | commit f9effe925039cf54489b5c04e0d40073bb3a123d upstream. Anton noticed that if we fault part way through emulating an unaligned instruction, we don't update the DAR to reflect that. The DAR value is eventually reported back to userspace as the address in the SEGV signal, and if userspace is using that value to demand fault then it can be confused by us not setting the value correctly. This patch is ugly as hell, but is intended to be the minimal fix and back ports easily. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/pseries: Don't attempt to acquire drc during memory hot add for ↵John Allen2017-09-271-0/+4
| | | | | | | | | | | | | | | | | | | | | assigned lmbs commit afb5519fdb346201728040cab4e08ce53e7ff4fd upstream. Check if an LMB is assigned before attempting to call dlpar_acquire_drc in order to avoid any unnecessary rtas calls. This substantially reduces the running time of memory hot add on lpars with large amounts of memory. [mpe: We need to explicitly set rc to 0 in the success case, otherwise the compiler might think we use rc without initialising it.] Fixes: c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step") Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/powernv/npu: Move tlb flush before launching ATSDAlistair Popple2017-09-271-6/+6
| | | | | | | | | | | | | | commit bab9f954aaf352127725a9b7920226abdb65b604 upstream. The nest MMU tlb flush needs to happen before the GPU translation shootdown is launched to avoid the GPU refilling its tlb with stale nmmu translations prior to the nmmu flush completing. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: update to new mmu_notifier semantic v2Jérôme Glisse2017-08-311-5/+0
| | | | | | | | | | | | | | | | | | | | | | Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Mike Galbraith <efault@gmx.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc/powernv: update to new mmu_notifier semanticJérôme Glisse2017-08-311-10/+0
| | | | | | | | | | | | | | | | | Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and now are bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Alistair Popple <alistair@popple.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-08-253-25/+102
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Paolo Bonzini: "Bugfixes for x86, PPC and s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state KVM: x86: simplify handling of PKRU KVM: x86: block guest protection keys unless the host has them enabled KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document them KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9 KVM: s390: sthyi: fix specification exception detection KVM: s390: sthyi: fix sthyi inline assembly
| * KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()Paul Mackerras2017-08-251-22/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nixiaoming pointed out that there is a memory leak in kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd() fails; the memory allocated for the kvmppc_spapr_tce_table struct is not freed, and nor are the pages allocated for the iommu tables. In addition, we have already incremented the process's count of locked memory pages, and this doesn't get restored on error. David Hildenbrand pointed out that there is a race in that the function checks early on that there is not already an entry in the stt->iommu_tables list with the same LIOBN, but an entry with the same LIOBN could get added between then and when the new entry is added to the list. This fixes all three problems. To simplify things, we now call anon_inode_getfd() before placing the new entry in the list. The check for an existing entry is done while holding the kvm->lock mutex, immediately before adding the new entry to the list. Finally, on failure we now call kvmppc_account_memlimit to decrement the process's count of locked memory pages. Reported-by: Nixiaoming <nixiaoming@huawei.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document themBenjamin Herrenschmidt2017-08-241-2/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds missing memory barriers to order updates/tests of the virtual CPPR and MFRR, thus fixing a lost IPI problem. While at it also document all barriers in this file. This fixes a bug causing guest IPIs to occasionally get lost. The symptom then is hangs or stalls in the guest. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit lossBenjamin Herrenschmidt2017-08-241-1/+10
| | | | | | | | | | | | | | | | | | | | | | This adds a workaround for a bug in POWER9 DD1 chips where changing the CPPR (Current Processor Priority Register) can cause bits in the IPB (Interrupt Pending Buffer) to get lost. Thankfully it only happens when manually manipulating CPPR which is quite rare. When it does happen it can cause interrupts to be delayed or lost. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9Nicholas Piggin2017-08-241-0/+3
| | | | | | | | | | | | | | | | | | | | When msgsnd is used for IPIs to other cores, msgsync must be executed by the target to order stores performed on the source before its msgsnd (provided the source executes the appropriate sync). Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | Merge tag 'powerpc-4.13-8' of ↵Linus Torvalds2017-08-253-0/+20
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, to add a barrier in the switch_mm() code to make sure the mm cpumask update is ordered vs the MMU starting to load translations. As far as we know no one's actually hit the bug, but that's just luck. Thanks to Benjamin Herrenschmidt, Nicholas Piggin" * tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Ensure cpumask update is ordered
| * powerpc/mm: Ensure cpumask update is orderedBenjamin Herrenschmidt2017-08-183-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no guarantee that the various isync's involved with the context switch will order the update of the CPU mask with the first TLB entry for the new context being loaded by the HW. Be safe here and add a memory barrier to order any subsequent load/store which may bring entries into the TLB. The corresponding barrier on the other side already exists as pte updates use pte_xchg() which uses __cmpxchg_u64 which has a sync after the atomic operation. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comments in the code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdogNicholas Piggin2017-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 05a4a9527931 ("kernel/watchdog: split up config options") lost the perf-based hardlockup detector's dependency on PERF_EVENTS, which can result in broken builds with some powerpc configurations. Restore the dependency. Add it in for x86 too, despite x86 always selecting PERF_EVENTS it seems reasonable to make the dependency explicit. Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'powerpc-4.13-7' of ↵Linus Torvalds2017-08-181-2/+3
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A bug in the VSX register saving that could cause userspace FP/VMX register corruption. Never seen to happen (that we know of), was found by code inspection, but still tagged for stable given the consequences" * tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
| * powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VECBenjamin Herrenschmidt2017-08-161-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers. Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled. Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled. Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | Merge tag 'powerpc-4.13-6' of ↵Linus Torvalds2017-08-118-68/+101
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "All fixes for code that went in this cycle. - a revert of an optimisation to the syscall exit path, which could lead to an oops on either older machines or machines with > 1TB of memory - disable some deep idle states if the firmware configuration for them fails - re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig change - six fairly small patches fixing bugs in our new watchdog code Thanks to: Gautham R Shenoy, Nicholas Piggin" * tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/watchdog: add locking around init/exit functions powerpc/watchdog: Fix marking of stuck CPUs powerpc/watchdog: Fix final-check recovered case powerpc/watchdog: Moderate touch_nmi_watchdog overhead powerpc/watchdog: Improve watchdog lock primitive powerpc: NMI IPI improve lock primitive powerpc/configs: Re-enable HARD/SOFT lockup detectors powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
| * powerpc/watchdog: add locking around init/exit functionsNicholas Piggin2017-08-091-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | When CPUs start and stop the watchdog, they manipulate shared data that is normally protected by the lock. Other CPUs can be running concurrently at this time, so it's a good idea to use locking here to be on the safe side. Remove the barrier which is undocumented and didn't do anything. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Fix marking of stuck CPUsNicholas Piggin2017-08-091-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the SMP detector finds other CPUs stuck, it iterates over them and marks them as stuck. This pulls them out of the pending mask and allows the detector to continue with remaining good CPUs (if nmi_watchdog=panic is not enabled). The code to dothat was buggy because when setting a CPU stuck, if the pending mask became empty, it resets it to keep the watchdog running. However the iterator will continue to run over the new pending mask and mark remaining good CPUs sas stuck. Fix this by doing it with cpumask bitwise operations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Fix final-check recovered caseNicholas Piggin2017-08-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | When the watchdog decides to panic, it takes the lock and double checks everything (to avoid races with the CPU being unstuck or panic()ed by something else). The exit label was misplaced and would result in all-CPUs backtrace and watchdog panic even in the case that the condition was found to be resolved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Moderate touch_nmi_watchdog overheadNicholas Piggin2017-08-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some code can go into a tight loop calling touch_nmi_watchdog (e.g., stop_machine CPU hotplug code). This can cause contention on watchdog locks particularly if all CPUs with watchdog enabled are spinning in the loops. Avoid this storm of activity by running the watchdog timer callback from this path if we have exceeded the timer period since it was last run. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Improve watchdog lock primitiveNicholas Piggin2017-08-091-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Hard-disable interrupts before taking the lock, which prevents soft-NMI re-entrancy and therefore can prevent deadlocks. - Use raw_ variants of local_irq_disable to avoid irq debugging. - When the lock is contended, spin at low SMT priority, using loads only, and with interrupts enabled (where possible). Some stalls have been noticed at high loads that go away with improved locking. There should not be so much locking contention in the first place (which is addressed in a subsequent patch), but locking should still be improved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc: NMI IPI improve lock primitiveNicholas Piggin2017-08-091-3/+3
| | | | | | | | | | | | | | | | | | | | When the NMI IPI lock is contended, spin at low SMT priority, using loads only, and with interrupts enabled (where possible). This improves behaviour under high contention (e.g., a system crash when a number of CPUs are trying to enter the debugger). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/configs: Re-enable HARD/SOFT lockup detectorsMichael Ellerman2017-08-093-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 05a4a9527931 ("kernel/watchdog: split up config options"), CONFIG_LOCKUP_DETECTOR was split into two separate config options, CONFIG_HARDLOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR. Our defconfigs still have CONFIG_LOCKUP_DETECTOR=y, but that is no longer user selectable, and we don't mention the new options, so we end up with none of them enabled. So update the defconfigs to turn on the new SOFT and HARD options, the end result being the same as what we had previously. Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api failsGautham R. Shenoy2017-08-081-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we use the opal call opal_slw_set_reg() to inform the Sleep-Winkle Engine (SLW) to restore the contents of some of the Hypervisor state on wakeup from deep idle states that lose full hypervisor context (characterized by the flag OPAL_PM_LOSE_FULL_CONTEXT). However, the current code has a bug in that if opal_slw_set_reg() fails, we don't disable the use of these deep states (winkle on POWER8, stop4 onwards on POWER9). This patch fixes this bug by ensuring that if programing the sleep-winkle engine to restore the hypervisor states in pnv_save_sprs_for_deep_states() fails, then we exclude such states by clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from supported_cpuidle_states. As a result POWER8 will be prevented from using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to the default stop state when available. Further, we ensure in the initialization of the cpuidle-powernv driver to only include those states whose flags are present in supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT states when they have been disabled due to stop-api failure. Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle states via stop API.") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"Michael Ellerman2017-08-072-46/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f. As reported by Andreas, this commit is causing unrecoverable SLB misses in the system call exit path: Unrecoverable exception 4100 at c00000000000a1ec Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2 PowerMac ... CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1 task: c00000018335e080 task.stack: c000000139e50000 NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000 REGS: c000000139e53bb0 TRAP: 4100 Not tainted (4.13.0-rc3) MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044 XER: 20000000 SOFTE: 1 GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080 GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001 GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88 GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000 GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000 GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000 GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10 NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c LR [c00000000000a118] system_call+0x58/0x6c Call Trace: [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable) Instruction dump: 64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000 60000000 7c004039 4082001c e8ed0170 <88070b78> 88c70b79 7c003214 2c200000 This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an SLB miss on the thread struct. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Diagnosed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | Merge tag 'powerpc-4.13-5' of ↵Linus Torvalds2017-08-049-21/+71
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for recently merged code: - a fix for the _PAGE_DEVMAP support, which was breaking KVM on Power9 radix - avoid a (harmless) lockdep warning in the early SMP code - return failure for some uses of dma_set_mask() rather than falling back to 32-bits - fix stack setup in watchdog soft_nmi_common() to use emergency stack - fix of_irq_to_resource() error check in of_fsl_spi_probe() Two fixes going to stable: - fix saving of Transactional Memory SPRs in core dump - fix __check_irq_replay missing decrementer interrupt And two misc: - fix 64-bit boot wrapper build with non-biarch compiler - work around a POWER9 PMU hang after state-loss idle Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner" * tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix __check_irq_replay missing decrementer interrupt powerpc/perf: POWER9 PMU stops after idle workaround powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check powerpc/64s: Fix stack setup in watchdog soft_nmi_common() powerpc/powernv/pci: Return failure for some uses of dma_set_mask() powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU powerpc/tm: Fix saving of TM SPRs in core dump powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
| * powerpc/64: Fix __check_irq_replay missing decrementer interruptNicholas Piggin2017-08-041-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the decrementer wraps again and de-asserts the decrementer exception while hard-disabled, __check_irq_replay() has a test to notice the wrap when interrupts are re-enabled. The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked because the decrementer interrupt was always the first one checked after clearing the hard disable flag, but HMI check was moved ahead of that, which introduced this bug. This can cause a missed decrementer interrupt if we soft-disable interrupts then take an HMI which is recorded in irq_happened, then hard-disable interrupts for > 4s to wrap the decrementer. Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud