summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cxl: Drop commands if the PCI channel is not in normal stateDaniel Axtens2015-08-145-12/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the PCI channel has gone down, don't attempt to poke the hardware. We need to guard every time cxl_whatever_(read|write) is called. This is because a call to those functions will dereference an offset into an mmio register, and the mmio mappings get invalidated in the EEH teardown. Check in the read/write functions in the header. We give them the same semantics as usual PCI operations: - a write to a channel that is down is ignored. - a read from a channel that is down returns all fs. Also, we try to access the MMIO space of a vPHB device as part of the PCI disable path. Because that's a read that bypasses most of our usual checks, we handle it explicitly. As far as user visible warnings go: - Check link state in file ops, return -EIO if down. - Be reasonably quiet if there's an error in a teardown path, or when we already know the hardware is going down. - Throw a big WARN if someone tries to start a CXL operation while the card is down. This gives a useful stacktrace for debugging whatever is doing that. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Convert MMIO read/write macros to inline functionsDaniel Axtens2015-08-141-16/+35
| | | | | | | | We're about to make these more complex, so make them functions first. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh: Probe after unbalanced kref checkDaniel Axtens2015-08-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the complete hotplug case, EEH PEs are supposed to be released and set to NULL. Normally, this is done by eeh_remove_device(), which is called from pcibios_release_device(). However, if something is holding a kref to the device, it will not be released, and the PE will remain. eeh_add_device_late() has a check for this which will explictly destroy the PE in this case. This check in eeh_add_device_late() occurs after a call to eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(), which will exit without probing if there is an existing PE. This means that on PowerNV, devices with outstanding krefs will not be rediscovered by EEH correctly after a complete hotplug. This is affecting CXL (CAPI) devices in the field. Put the probe after the kref check so that the PE is destroyed and affected devices are correctly rediscovered by EEH. Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug") Cc: stable@vger.kernel.org Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add an inline function to update POWER8 HID0Gautham R. Shenoy2015-08-142-2/+11
| | | | | | | | | | | | | | Section 3.7 of Version 1.2 of the Power8 Processor User's Manual prescribes that updates to HID0 be preceded by a SYNC instruction and followed by an ISYNC instruction (Page 91). Create an inline function name update_power8_hid0() which follows this recipe and invoke it from the static split core path. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/prom: Use DRCONF flags while processing detected LMBsAnshuman Khandual2015-08-121-3/+4
| | | | | | | | Replace hard coded values with existing DRCONF flags while procesing detected LMBs from the device tree. Does not change any functionality. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Drop the valid variable completely in dump_segments()Anshuman Khandual2015-08-121-4/+3
| | | | | | | | | | | | | The value of 'valid' is always zero when 'esid' is zero, and if 'esid' is non-zero then the value of 'valid' is irrelevant because we are using logical or in the if expression. In fact 'valid' can be dropped completely from dump_segments() by simply doing the check with SLB_ESID_V directly in the if. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/prom: Simplify the logic to fetch SLB sizeAnshuman Khandual2015-08-121-11/+7
| | | | | | | | | | | | | | | | The code to fetch the SLB size from the device tree wants to first look for "slb-size" and then if that's not found "ibm,slb-size". We can simplify the code by looking for the properties and then if we find one of them we set mmu_slb_size. We also change the function name from check_cpu_slb_size() to init_mmu_slb_size() as the function doesn't check anything, it only initialises mmu_slb_size. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/slb: Add documentation on runtime patching of SLB encodingAnshuman Khandual2015-08-121-1/+15
| | | | | | | | | This patch adds some documentation to patch_slb_encoding() explaining how it works. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Update change log and mention the signedness of the immediate] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/slb: Rename all the 'slot' occurrences to 'entry'Anshuman Khandual2015-08-121-4/+3
| | | | | | | | | The SLB code uses 'slot' and 'entry' interchangeably, change it to always use 'entry'. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/slb: Remove a duplicate extern variableAnshuman Khandual2015-08-121-1/+0
| | | | | | | | | This patch just removes one redundant entry for one extern variable 'slb_compare_rr_to_size' from the scope. This patch does not change any functionality. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: sparse: Silence iomem warning in debugfs file creationDaniel Axtens2015-08-121-1/+1
| | | | | | | | | | | | | | | | | An IO address, tagged with __iomem, is passed to debugfs_create_file as private data. This requires that it be cast to void *. The cast drops the __iomem annotation and so creates a sparse warning: drivers/misc/cxl/debugfs.c:51:57: warning: cast removes address space of expression The address space marker is added back in the file operations (fops_io_u64). Silence the warning with __force. Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: sparse: Make declarations staticDaniel Axtens2015-08-122-4/+4
| | | | | | | | | | | | | | | A few declarations were identified by sparse as needing to be static: drivers/misc/cxl/irq.c:408:6: warning: symbol 'afu_irq_name_free' was not declared. Should it be static? drivers/misc/cxl/irq.c:467:6: warning: symbol 'afu_register_hwirqs' was not declared. Should it be static? drivers/misc/cxl/file.c:254:6: warning: symbol 'afu_compat_ioctl' was not declared. Should it be static? drivers/misc/cxl/file.c:399:30: warning: symbol 'afu_master_fops' was not declared. Should it be static? Make them static. Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Compile with -WerrorDaniel Axtens2015-08-111-0/+2
| | | | | | | | It's a good idea, and it brings us in line with the rest of arch/powerpc. Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/ftrace: add powerpc timebase as a trace clock sourceNaveen N. Rao2015-08-065-1/+40
| | | | | | | | | | | | Add a new powerpc-specific trace clock using the timebase register, similar to x86-tsc. This gives us - a fast, monotonic, hardware clock source for trace entries, and - a clock that can be used to correlate events across cpus as well as across hypervisor and guests. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/4xx: Fix return value check in hsta_msi_probe()Wei Yongjun2015-08-061-2/+2
| | | | | | | | | In case of error, the functions platform_get_resource() and kmalloc() returns NULL not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* windfarm: remove three exported but unused functionsPaul Bolle2015-08-062-46/+0
| | | | | | | | wf_find_control(), wf_find_sensor(), and wf_is_overtemp() are exported but unused. Remove these three functions. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* windfarm: make wf_critical_overtemp() staticPaul Bolle2015-08-061-2/+1
| | | | | | | | | wf_critical_overtemp() is exported. But nothing uses that export. That's unsurprising because there's no header that defines it. Stop exporting that function and make it static. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* windfarm: decrement client count when unregisteringPaul Bolle2015-08-061-1/+1
| | | | | | | | | | | wf_unregister_client() increments the client count when a client unregisters. That is obviously incorrect. Decrement that client count instead. Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Remove redundant breaksJoe Perches2015-08-062-2/+0
| | | | | | | | | break; break; isn't useful. Remove one. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: pci: use %pR for printing struct resourceKevin Hao2015-08-061-54/+18
| | | | | | | | | Use %pR to simplify the debug code. This also make the debug info more readable. Signed-off-by: Kevin Hao <haokexin@gmail.com> [mpe: Unsplit multi-line printk strings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Don't ignore add_process_element() result when attaching contextDaniel Axtens2015-08-061-3/+1
| | | | | | | | | | Currently when attaching a context in dedicated mode, we ignore the result of add_process_element(), which could potentially fail. If add_process_element() returns an error, pass it back to the caller. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI.Mahesh Salgaonkar2015-08-061-0/+19
| | | | | | | | | | Invoke new opal_cec_reboot2() call with reboot type OPAL_REBOOT_PLATFORM_ERROR (for unrecoverable HMI interrupts) to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check ↵Mahesh Salgaonkar2015-08-064-1/+45
| | | | | | | | | | | | | | | | | | | | | | | | errors. On non-recoverable MCE errors in kernel space, Linux kernel panics and system reboots. On BMC based system opal-prd runs as a daemon in the host. Hence, kernel crash may prevent opal-prd to detect and analyze this MCE error. This may land us in a situation where the faulty memory never gets de-configured and Linux would keep hitting same MCE error again and again. If this happens in early stage of kernel initialization, then Linux will keep crashing and rebooting in a loop. This patch fixes this issue by invoking new opal_cec_reboot2() call with reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. This patch is dependent on OPAL patchset posted on skiboot mailing list at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that introduces opal_cec_reboot2() opal call. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Pull all HMI events before panic.Mahesh Salgaonkar2015-08-061-2/+24
| | | | | | | | | | | | In the event of unrecovered HMI the existing code panics as soon as it receives the first unrecovered HMI event. This makes host to report partial information about HMIs before panic. There may be more errors which would have caused the HMI and hence more HMI event would have been generated waiting to be pulled by host. This patch implements a logic to pull and display all the HMI event before going down panic path. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: display reason for Malfunction Alert HMI.Mahesh Salgaonkar2015-08-062-0/+193
| | | | | | | | | | | The V2 version of HMI event now carries additional information for Malfunction Alert. It now contains error information about CORE and NX checkstop. This patch checks and displays the check stop reason before panic. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/seccomp: Add powerpc supportMichael Ellerman2015-07-301-1/+8
| | | | | | | | | | Wire up the syscall number and regs so the tests work on powerpc. With the powerpc kernel support just merged, all tests pass on ppc64, ppc64 (compat), ppc64le, ppc, ppc64e and ppc64e (compat). Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/seccomp: Make seccomp tests work on big endianMichael Ellerman2015-07-301-0/+6
| | | | | | | | | | | | The seccomp_bpf test uses BPF_LD|BPF_W|BPF_ABS to load 32-bit values from seccomp_data->args. On big endian machines this will load the high word of the argument, which is not what the test wants. Borrow a hack from samples/seccomp/bpf-helper.h which changes the offset on big endian to account for this. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Kees Cook <keescook@chromium.org>
* powerpc/kernel: Enable seccomp filterMichael Ellerman2015-07-302-1/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit enables seccomp filter on powerpc, now that we have all the necessary pieces in place. To support seccomp's desire to modify the syscall return value under some circumstances, we use a different ABI to the ptrace ABI. That is we use r3 as the syscall return value, and orig_gpr3 is the first syscall parameter. This means the seccomp code, or a ptracer via SECCOMP_RET_TRACE, will see -ENOSYS preloaded in r3. This is identical to the behaviour on x86, and allows seccomp or the ptracer to either leave the -ENOSYS or change it to something else, as well as rejecting or not the syscall by modifying r0. If seccomp does not reject the syscall, we restore the register state to match what ptrace and audit expect, ie. r3 is the first syscall parameter again. We do this restore using orig_gpr3, which may have been modified by seccomp, which allows seccomp to modify the first syscall paramater and allow the syscall to proceed. We need to #ifdef the the additional handling of r3 for seccomp, so move it all out of line. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc/kernel: Add SIG_SYS support for compat tasksMichael Ellerman2015-07-292-0/+12
| | | | | | | | | | | | | | | | | | | | | SIG_SYS was added in commit a0727e8ce513 "signal, x86: add SIGSYS info and make it synchronous." Because we use the asm-generic struct siginfo, we got support for SIG_SYS for free as part of that commit. However there was no compat handling added for powerpc. That means we've been advertising the existence of signfo._sifields._sigsys to compat tasks, but not actually filling in the fields correctly. Luckily it looks like no one has noticed, presumably because the only user of SIGSYS in the kernel is seccomp filter, which we don't support yet. So before we enable seccomp filter, add compat handling for SIGSYS. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc: Change syscall_get_nr() to return intMichael Ellerman2015-07-291-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | The documentation for syscall_get_nr() in asm-generic says: Note this returns int even on 64-bit machines. Only 32 bits of system call number can be meaningful. If the actual arch value is 64 bits, this truncates to 32 bits so 0xffffffff means -1. However our implementation was never updated to reflect this. Generally it's not important, but there is once case where it matters. For seccomp filter with SECCOMP_RET_TRACE, the tracer will set regs->gpr[0] to -1 to reject the syscall. When the task is a compat task, this means we end up with 0xffffffff in r0 because ptrace will zero extend the 32-bit value. If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will see a positive value in r0 and will incorrectly allow the syscall through seccomp. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc: Use orig_gpr3 in syscall_get_arguments()Michael Ellerman2015-07-291-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Currently syscall_get_arguments() is used by syscall tracepoints, and collect_syscall() which is used in some debugging as well as /proc/pid/syscall. The current implementation just copies regs->gpr[3 .. 5] out, which is fine for all the current use cases. When we enable seccomp filter, that will also start using syscall_get_arguments(). However for seccomp filter we want to use r3 as the return value of the syscall, and orig_gpr3 as the first parameter. This will allow seccomp to modify the return value in r3. To support this we need to modify syscall_get_arguments() to return orig_gpr3 instead of r3. This is safe for all uses because orig_gpr3 always contains the r3 value that was passed to the syscall. We store it in the syscall entry path and never modify it. Update syscall_set_arguments() while we're here, even though it's never used. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc: Rework syscall_get_arguments() so there is only one loopMichael Ellerman2015-07-291-11/+8
| | | | | | | | | | Currently syscall_get_arguments() has two loops, one for compat and one for regular tasks. In prepartion for the next patch, which changes which registers we use, switch it to only have one loop, so we only have one place to update. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc: Don't negate error in syscall_set_return_value()Michael Ellerman2015-07-291-1/+7
| | | | | | | | | | | | | | | | | Currently the only caller of syscall_set_return_value() is seccomp filter, which is not enabled on powerpc. This means we have not noticed that our implementation of syscall_set_return_value() negates error, even though the value passed in is already negative. So remove the negation in syscall_set_return_value(), and expect the caller to do it like all other implementations do. Also add a comment about the ccr handling. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc: Drop unused syscall_get_error()Michael Ellerman2015-07-291-6/+0
| | | | | | | | | | | | | | | syscall_get_error() is unused, and never has been. It's also probably wrong, as it negates r3 before returning it, but that depends on what the caller is expecting. It also doesn't deal with compat, and doesn't deal with TIF_NOERROR. Although we could fix those, until it has a caller and it's clear what semantics the caller wants it's just untested code. So drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc/kernel: Change the do_syscall_trace_enter() APIMichael Ellerman2015-07-293-17/+58
| | | | | | | | | | | | | | | | | | | | | | The API for calling do_syscall_trace_enter() is currently sensible enough, it just returns the (modified) syscall number. However once we enable seccomp filter it will get more complicated. When seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may *not* set a return value in r3. That means the assembler that calls do_syscall_trace_enter() can not blindly return ENOSYS, it needs to only return ENOSYS if a return value has not already been set. There is no way to implement that logic with the current API. So change the do_syscall_trace_enter() API to make it deal with the return code juggling, and the assembler can then just return whatever return code it is given. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc/kernel: Switch to using MAX_ERRNOMichael Ellerman2015-07-293-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently on powerpc we have our own #define for the highest (negative) errno value, called _LAST_ERRNO. This is defined to be 516, for reasons which are not clear. The generic code, and x86, use MAX_ERRNO, which is defined to be 4095. In particular seccomp uses MAX_ERRNO to restrict the value that a seccomp filter can return. Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp tracer wanting to return 600, expecting it to be seen as an error, would instead find on powerpc that userspace sees a successful syscall with a return value of 600. To avoid this inconsistency, switch powerpc to use MAX_ERRNO. We are somewhat confident that generic syscalls that can return a non-error value above negative MAX_ERRNO have already been updated to use force_successful_syscall_return(). I have also checked all the powerpc specific syscalls, and believe that none of them expect to return a non-error value between -MAX_ERRNO and -516. So this change should be safe ... Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org>
* powerpc/perf: Change type of the bhrb_users variableAnshuman Khandual2015-07-271-2/+2
| | | | | | | | | This patch just changes data type of bhrb_users variable from int to unsigned int because it never contains a negative value. Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf/hv-24x7: Simplify extracting counter from result bufferSukadev Bhattiprolu2015-07-251-3/+1
| | | | | | | | Simplify code that extracts a 24x7 counter from the HCALL's result buffer. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf/hv-24x7: Whitespace - fix parameter alignmentSukadev Bhattiprolu2015-07-251-10/+10
| | | | | | | Fix parameter alignment to be consistent with coding style. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*Paul Mackerras2015-07-234-17/+17
| | | | | | | | | | | | | | | | | | | | | | The hardware RNG on POWER8 and POWER7+ can be relatively slow, since it can only supply one 64-bit value per microsecond. Currently we read it in arch_get_random_long(), but that slows down reading from /dev/urandom since the code in random.c calls arch_get_random_long() for every longword read from /dev/urandom. Since the hardware RNG supplies high-quality entropy on every read, it matches the semantics of arch_get_random_seed_long() better than those of arch_get_random_long(). Therefore this commit makes the code use the POWER8/7+ hardware RNG only for arch_get_random_seed_{long,int} and not for arch_get_random_{long,int}. This won't affect any other PowerPC-based platforms because none of them currently support a hardware RNG. To make it clear that the ppc_md function pointer is used for arch_get_random_seed_*, we rename it from get_random_long to get_random_seed. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlersThomas Huth2015-07-233-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EPOW interrupt handler uses rtas_get_sensor(), which in turn uses rtas_busy_delay() to wait for RTAS becoming ready in case it is necessary. But rtas_busy_delay() is annotated with might_sleep() and thus may not be used by interrupts handlers like the EPOW handler! This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6 Call Trace: [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable) [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180 Fix this issue by introducing a new rtas_get_sensor_fast() function that does not use rtas_busy_delay() - and thus can only be used for sensors that do not cause a BUSY condition - known as "fast" sensors. The EPOW sensor is defined to be "fast" in sPAPR - mpe. Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/rtas: Replace magic values with definesThomas Huth2015-07-231-3/+5
| | | | | | | | | rtas.h already has some nice #defines for RTAS return status codes - let's use them instead of hard-coded "magic" values! Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh: Dump PHB diag-data for non-existing PEGavin Shan2015-07-211-1/+9
| | | | | | | | | | When detecting EEH error on non-existing PE, including the reserved one, the PE is simply unfrozen without dumping the PHB diag-data, which is useful for locating the root cause of the EEH error. The patch dumps the PHB diag-data when non-existing PE reports error. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh: Fix wrong printed PE numberGavin Shan2015-07-211-1/+1
| | | | | | | | | | | On LE kernel, the non-existing PE number in BE format derived from skiboot firmware isn't converted to LE format properly as following kernel log indicates: EEH: Clear non-existing PHB#4-PE#200000000000000 Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/signal: Add helper function to fetch quad word aligned pointerAnshuman Khandual2015-07-211-5/+16
| | | | | | | | | | This patch adds one helper function 'sigcontext_vmx_regs' which computes quad word aligned pointer for 'vmx_reserve' array element in sigcontext structure making the code more readable. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Reword comment and fix build for CONFIG_ALTIVEC=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/signal: Fix confusing header documentation in sigcontext.hAnshuman Khandual2015-07-161-2/+2
| | | | | | | | | | | | | | | Commit ce48b2100785 "powerpc: Add VSX context save/restore, ptrace and signal support" expanded the 'vmx_reserve' array element to contain 101 double words, but the comment block above was not updated. Also reorder the constants in the array size declaration to reflect the logic mentioned in the comment block above. This change helps in explaining how the HW registers are represented in the array. But no functional change. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Reworded change log and added whitespace around +'s] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/tm: Drop tm_orig_msr from thread_structAnshuman Khandual2015-07-162-8/+7
| | | | | | | | | | | | Currently tm_orig_msr is getting used during process context switch only. Then there is ckpt_regs which saves the checkpointed userspace context The MSR slot contained in ckpt_regs structure can be used during process context switch instead of tm_orig_msr, thus allowing us to drop it from thread_struct structure. This patch does that change. Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Destroy afu->contexts_idr on release of an afuJohannes Thumshirn2015-07-161-0/+1
| | | | | | | | | Destroy afu->contexts_idr on release of an afu, reclaiming the allocated memory. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Destroy cxl_adapter_idr on module_exitJohannes Thumshirn2015-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Destroy cxl_adapter_idr on module exit, reclaiming the allocated memory. This was detected by the following semantic patch (written by Luis Rodriguez <mcgrof@suse.com>) <SmPL> @ defines_module_init @ declarer name module_init, module_exit; declarer name DEFINE_IDR; identifier init; @@ module_init(init); @ defines_module_exit @ identifier exit; @@ module_exit(exit); @ declares_idr depends on defines_module_init && defines_module_exit @ identifier idr; @@ DEFINE_IDR(idr); @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... idr_destroy(&idr); ... } @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... +idr_destroy(&idr); } </SmPL> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platformVipin K Parashar2015-07-164-18/+173
| | | | | | | | | | | | | | | | This patch adds support for OPAL EPOW (Environmental and Power Warnings) and DPO (Delayed Power Off) events for the PowerNV platform. These events are generated on FSP (Flexible Service Processor) based systems. EPOW events are generated due to various critical system conditions that require system shutdown. A few examples of these conditions are high ambient temperature or system running on UPS power with low UPS battery. DPO event is generated in response to admin initiated system shutdown request. Upon receipt of EPOW and DPO events the host kernel invokes orderly_poweroff() for performing graceful system shutdown. Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> Acked-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud