From 17fea54bf0ab34fa09a06bbde2f58ed7bbdf9299 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Mon, 18 May 2015 10:07:17 +0200 Subject: x86/mce: Fix MCE severity messages Derek noticed that a critical MCE gets reported with the wrong error type description: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0 [Hardware Error]: RIP !INEXACT! 10: {intel_idle+0xb1/0x170} [Hardware Error]: TSC 49587b8e321cb [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29 [Hardware Error]: Some CPUs didn't answer in synchronization [Hardware Error]: Machine check: Invalid ^^^^^^^ The last line with 'Invalid' should have printed the high level MCE error type description we get from mce_severity, i.e. something like: [Hardware Error]: Machine check: Action required: data load error in a user process this happens due to the fact that mce_no_way_out() iterates over all MCA banks and possibly overwrites the @msg argument which is used in the panic printing later. Change behavior to take the message of only and the (last) critical MCE it detects. Reported-by: Derek Signed-off-by: Borislav Petkov Cc: Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tony Luck Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/mcheck/mce.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'arch/x86') diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index e535533d5ab8..20190bdac9d5 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -708,6 +708,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, struct pt_regs *regs) { int i, ret = 0; + char *tmp; for (i = 0; i < mca_cfg.banks; i++) { m->status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i)); @@ -716,9 +717,11 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, if (quirk_no_way_out) quirk_no_way_out(i, m, regs); } - if (mce_severity(m, mca_cfg.tolerant, msg, true) >= - MCE_PANIC_SEVERITY) + + if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { + *msg = tmp; ret = 1; + } } return ret; } -- cgit v1.2.1 From e88221c50cadade0eb4f7f149f4967d760212695 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Wed, 20 May 2015 11:45:30 +0200 Subject: x86/fpu: Disable XSAVES* support for now The kernel's handling of 'compacted' xsave state layout is buggy: http://marc.info/?l=linux-kernel&m=142967852317199 I don't have such a system, and the description there is vague, but from extrapolation I guess that there were two kinds of bugs observed: - boot crashes, due to size calculations being wrong and the dynamic allocation allocating a too small xstate area. (This is now fixed in the new FPU code - but still present in stable kernels.) - FPU state corruption and ABI breakage: if signal handlers try to change the FPU state in standard format, which then the kernel tries to restore in the compacted format. These breakages are scary, but they only occur on a small number of systems that have XSAVES* CPU support. Yet we have had XSAVES support in the upstream kernel for a large number of stable kernel releases, and the fixes are involved and unproven. So do the safe resolution first: disable XSAVES* support and only use the standard xstate format. This makes the code work and is easy to backport. On top of this we can work on enabling (and testing!) proper compacted format support, without backporting pressure, on top of the new, cleaned up FPU code. Cc: Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- arch/x86/kernel/i387.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'arch/x86') diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 009183276bb7..6185d3141219 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -173,6 +173,21 @@ static void init_thread_xstate(void) xstate_size = sizeof(struct i387_fxsave_struct); else xstate_size = sizeof(struct i387_fsave_struct); + + /* + * Quirk: we don't yet handle the XSAVES* instructions + * correctly, as we don't correctly convert between + * standard and compacted format when interfacing + * with user-space - so disable it for now. + * + * The difference is small: with recent CPUs the + * compacted format is only marginally smaller than + * the standard FPU state format. + * + * ( This is easy to backport while we are fixing + * XSAVES* support. ) + */ + setup_clear_cpu_cap(X86_FEATURE_XSAVES); } /* -- cgit v1.2.1 From 3f7352bf21f8fd7ba3e2fcef9488756f188e12be Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Fri, 22 May 2015 15:42:55 -0700 Subject: x86: bpf_jit: fix compilation of large bpf programs x86 has variable length encoding. x86 JIT compiler is trying to pick the shortest encoding for given bpf instruction. While doing so the jump targets are changing, so JIT is doing multiple passes over the program. Typical program needs 3 passes. Some very short programs converge with 2 passes. Large programs may need 4 or 5. But specially crafted bpf programs may hit the pass limit and if the program converges on the last iteration the JIT compiler will be producing an image full of 'int 3' insns. Fix this corner case by doing final iteration over bpf program. Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Reported-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov Tested-by: Daniel Borkmann Acked-by: Daniel Borkmann Signed-off-by: David S. Miller --- arch/x86/net/bpf_jit_comp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'arch/x86') diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 99f76103c6b7..ddeff4844a10 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -966,7 +966,12 @@ void bpf_int_jit_compile(struct bpf_prog *prog) } ctx.cleanup_addr = proglen; - for (pass = 0; pass < 10; pass++) { + /* JITed image shrinks with every pass and the loop iterates + * until the image stops shrinking. Very large bpf programs + * may converge on the last pass. In such case do one more + * pass to emit the final image + */ + for (pass = 0; pass < 10 || image; pass++) { proglen = do_jit(prog, addrs, image, oldproglen, &ctx); if (proglen <= 0) { image = NULL; -- cgit v1.2.1 From fb5d432722e186c656285ccc088e35dbe24f6fd1 Mon Sep 17 00:00:00 2001 From: Dasaratharaman Chandramouli Date: Wed, 20 May 2015 09:49:34 -0700 Subject: tools/power turbostat: enable turbostat to support Knights Landing (KNL) Changes mainly to account for minor differences in Knights Landing(KNL): 1. KNL supports C1 and C6 core states. 2. KNL supports PC2, PC3 and PC6 package states. 3. KNL has a different encoding of the TURBO_RATIO_LIMIT MSR Signed-off-by: Dasaratharaman Chandramouli Signed-off-by: Len Brown --- arch/x86/include/uapi/asm/msr-index.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/x86') diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index c469490db4a8..3c6bb342a48f 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -140,6 +140,7 @@ #define MSR_CORE_C3_RESIDENCY 0x000003fc #define MSR_CORE_C6_RESIDENCY 0x000003fd #define MSR_CORE_C7_RESIDENCY 0x000003fe +#define MSR_KNL_CORE_C6_RESIDENCY 0x000003ff #define MSR_PKG_C2_RESIDENCY 0x0000060d #define MSR_PKG_C8_RESIDENCY 0x00000630 #define MSR_PKG_C9_RESIDENCY 0x00000631 -- cgit v1.2.1 From dc4fdaf0e4839109169d8261814813816951c75f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 28 May 2015 01:39:53 +0200 Subject: PCI / ACPI: Do not set ACPI companions for host bridges with parents Commit 97badf873ab6 (device property: Make it possible to use secondary firmware nodes) uncovered a bug in the x86 (and ia64) PCI host bridge initialization code that assumes bridge->bus->sysdata to always point to a struct pci_sysdata object which need not be the case (in particular, the Xen PCI frontend driver sets it to point to a different data type). If it is not the case, an incorrect pointer (or a piece of data that is not a pointer at all) will be passed to ACPI_COMPANION_SET() and that may cause interesting breakage to happen going forward. To work around this problem use the observation that the ACPI host bridge initialization always passes NULL as parent to pci_create_root_bus(), so if pcibios_root_bridge_prepare() sees a non-NULL parent of the bridge, it should not attempt to set an ACPI companion for it, because that means that pci_create_root_bus() has been called by someone else. Fixes: 97badf873ab6 (device property: Make it possible to use secondary firmware nodes) Reported-and-tested-by: Sander Eikelenboom Signed-off-by: Rafael J. Wysocki Acked-by: Bjorn Helgaas --- arch/x86/pci/acpi.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'arch/x86') diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index d93963340c3c..14a63ed6fe09 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -482,9 +482,16 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) { - struct pci_sysdata *sd = bridge->bus->sysdata; - - ACPI_COMPANION_SET(&bridge->dev, sd->companion); + /* + * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL + * here, pci_create_root_bus() has been called by someone else and + * sysdata is likely to be different from what we expect. Let it go in + * that case. + */ + if (!bridge->dev.parent) { + struct pci_sysdata *sd = bridge->bus->sysdata; + ACPI_COMPANION_SET(&bridge->dev, sd->companion); + } return 0; } -- cgit v1.2.1