summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/apic
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'x86-urgent-2020-02-09' of ↵Linus Torvalds2020-02-092-8/+150
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Ensure that the PIT is set up when the local APIC is disable or configured in legacy mode. This is caused by an ordering issue introduced in the recent changes which skip PIT initialization when the TSC and APIC frequencies are already known. - Handle malformed SRAT tables during early ACPI parsing which caused an infinite loop anda boot hang. - Fix a long standing race in the affinity setting code which affects PCI devices with non-maskable MSI interrupts. The problem is caused by the non-atomic writes of the MSI address (destination APIC id) and data (vector) fields which the device uses to construct the MSI message. The non-atomic writes are mandated by PCI. If both fields change and the device raises an interrupt after writing address and before writing data, then the MSI block constructs a inconsistent message which causes interrupts to be lost and subsequent malfunction of the device. The fix is to redirect the interrupt to the new vector on the current CPU first and then switch it over to the new target CPU. This allows to observe an eventually raised interrupt in the transitional stage (old CPU, new vector) to be observed in the APIC IRR and retriggered on the new target CPU and the new vector. The potential spurious interrupts caused by this are harmless and can in the worst case expose a buggy driver (all handlers have to be able to deal with spurious interrupts as they can and do happen for various reasons). - Add the missing suspend/resume mechanism for the HYPERV hypercall page which prevents resume hibernation on HYPERV guests. This change got lost before the merge window. - Mask the IOAPIC before disabling the local APIC to prevent potentially stale IOAPIC remote IRR bits which cause stale interrupt lines after resume" * tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Mask IOAPIC entries when disabling the local APIC x86/hyperv: Suspend/resume the hypercall page for hibernation x86/apic/msi: Plug non-maskable MSI affinity race x86/boot: Handle malformed SRAT tables during early ACPI parsing x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
| * x86/apic: Mask IOAPIC entries when disabling the local APICTony W Wang-oc2020-02-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a system suspends, the local APIC is disabled in the suspend sequence, but the IOAPIC is left in the current state. This means unmasked interrupt lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the ACPI interrupt is connected. That means that in suspended state the IOAPIC can respond to an external interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message cannot be handled by the disabled local APIC. As a consequence the Remote IRR bit is set, but the local APIC does not send an EOI to acknowledge it. This causes the affected interrupt line to become stale and the stale Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked for that interrupt line. To prevent this, mask all IOAPIC entries before disabling the local APIC. The resume code already has the unmask operation inside. [ tglx: Massaged changelog ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579076539-7267-1-git-send-email-TonyWWang-oc@zhaoxin.com
| * x86/apic/msi: Plug non-maskable MSI affinity raceThomas Gleixner2020-02-011-3/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Evan tracked down a subtle race between the update of the MSI message and the device raising an interrupt internally on PCI devices which do not support MSI masking. The update of the MSI message is non-atomic and consists of either 2 or 3 sequential 32bit wide writes to the PCI config space. - Write address low 32bits - Write address high 32bits (If supported by device) - Write data When an interrupt is migrated then both address and data might change, so the kernel attempts to mask the MSI interrupt first. But for MSI masking is optional, so there exist devices which do not provide it. That means that if the device raises an interrupt internally between the writes then a MSI message is sent built from half updated state. On x86 this can lead to spurious interrupts on the wrong interrupt vector when the affinity setting changes both address and data. As a consequence the device interrupt can be lost causing the device to become stuck or malfunctioning. Evan tried to handle that by disabling MSI accross an MSI message update. That's not feasible because disabling MSI has issues on its own: If MSI is disabled the PCI device is routing an interrupt to the legacy INTx mechanism. The INTx delivery can be disabled, but the disablement is not working on all devices. Some devices lose interrupts when both MSI and INTx delivery are disabled. Another way to solve this would be to enforce the allocation of the same vector on all CPUs in the system for this kind of screwed devices. That could be done, but it would bring back the vector space exhaustion problems which got solved a few years ago. Fortunately the high address (if supported by the device) is only relevant when X2APIC is enabled which implies interrupt remapping. In the interrupt remapping case the affinity setting is happening at the interrupt remapping unit and the PCI MSI message is programmed only once when the PCI device is initialized. That makes it possible to solve it with a two step update: 1) Target the MSI msg to the new vector on the current target CPU 2) Target the MSI msg to the new vector on the new target CPU In both cases writing the MSI message is only changing a single 32bit word which prevents the issue of inconsistency. After writing the final destination it is necessary to check whether the device issued an interrupt while the intermediate state #1 (new vector, current CPU) was in effect. This is possible because the affinity change is always happening on the current target CPU. The code runs with interrupts disabled, so the interrupt can be detected by checking the IRR of the local APIC. If the vector is pending in the IRR then the interrupt is retriggered on the new target CPU by sending an IPI for the associated vector on the target CPU. This can cause spurious interrupts on both the local and the new target CPU. 1) If the new vector is not in use on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then interrupt entry code will ignore that spurious interrupt. The vector is marked so that the 'No irq handler for vector' warning is supressed once. 2) If the new vector is in use already on the local CPU then the IRR check might see an pending interrupt from the device which is using this vector. The IPI to the new target CPU will then invoke the handler of the device, which got the affinity change, even if that device did not issue an interrupt 3) If the new vector is in use already on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then the handler of the device which uses that vector on the local CPU will be invoked. expose issues in device driver interrupt handlers which are not prepared to handle a spurious interrupt correctly. This not a regression, it's just exposing something which was already broken as spurious interrupts can happen for a lot of reasons and all driver handlers need to be able to deal with them. Reported-by: Evan Green <evgreen@chromium.org> Debugged-by: Evan Green <evgreen@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
| * x86/timer: Don't skip PIT setup when APIC is disabled or in legacy modeThomas Gleixner2020-01-291-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tony reported a boot regression caused by the recent workaround for systems which have a disabled (clock gate off) PIT. On his machine the kernel fails to initialize the PIT because apic_needs_pit() does not take into account whether the local APIC interrupt delivery mode will actually allow to setup and use the local APIC timer. This should be easy to reproduce with acpi=off on the command line which also disables HPET. Due to the way the PIT/HPET and APIC setup ordering works (APIC setup can require working PIT/HPET) the information is not available at the point where apic_needs_pit() makes this decision. To address this, split out the interrupt mode selection from apic_intr_mode_init(), invoke the selection before making the decision whether PIT is required or not, and add the missing checks into apic_needs_pit(). Fixes: c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") Reported-by: Anthony Buckley <tony.buckley000@gmail.com> Tested-by: Anthony Buckley <tony.buckley000@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Daniel Drake <drake@endlessm.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206125 Link: https://lore.kernel.org/r/87sgk6tmk2.fsf@nanos.tec.linutronix.de
* | x86/apic/uv: Avoid unused variable warningArnd Bergmann2020-01-171-37/+6
|/ | | | | | | | | | | | | | | | | When CONFIG_PROC_FS is disabled, the compiler warns about an unused variable: arch/x86/kernel/apic/x2apic_uv_x.c: In function 'uv_setup_proc_files': arch/x86/kernel/apic/x2apic_uv_x.c:1546:8: error: unused variable 'name' [-Werror=unused-variable] char *name = hubless ? "hubless" : "hubbed"; Simplify the code so this variable is no longer needed. Fixes: 8785968bce1c ("x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191212140419.315264-1-arnd@arndb.de
* Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds2019-11-261-21/+163
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "UV platform updates (with a 'hubless' variant) and Jailhouse updates for better UART support" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/jailhouse: Only enable platform UARTs if available x86/jailhouse: Improve setup data version comparison x86/platform/uv: Account for UV Hubless in is_uvX_hub Ops x86/platform/uv: Check EFI Boot to set reboot type x86/platform/uv: Decode UVsystab Info x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files x86/platform/uv: Setup UV functions for Hubless UV Systems x86/platform/uv: Add return code to UV BIOS Init function x86/platform/uv: Return UV Hubless System Type x86/platform/uv: Save OEM_ID from ACPI MADT probe
| * x86/platform/uv: Check EFI Boot to set reboot typeMike Travis2019-10-071-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change to checking for EFI Boot type from previous check on if this is a KDUMP kernel. This allows for KDUMP kernels that can handle EFI reboots. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.215091717@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform/uv: Decode UVsystab InfoMike Travis2019-10-071-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decode the hubless UVsystab passed from BIOS to the kernel saving pertinent info in a similar manner that hubbed UVsystabs are decoded. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.135325066@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform/uv: Add UV Hubbed/Hubless Proc FS FilesMike Travis2019-10-071-1/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Indicate to UV user utilities that UV hubless support is available on this system via the existing /proc infterface. The current interface is maintained with the addition of new /proc leaves ("hubbed", "hubless", and "oemid") that contain the specific type of UV arch this one is. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.055590900@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform/uv: Setup UV functions for Hubless UV SystemsMike Travis2019-10-071-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add more support for UV systems that do not contain a UV Hub (AKA "hubless"). This update adds support for additional functions required: Use PCH NMI handler instead of a UV Hub NMI handler. Initialize the UV BIOS callback interface used to support specific UV functions. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.975787119@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform/uv: Return UV Hubless System TypeMike Travis2019-10-071-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return the type of UV hubless system for UV specific code that depends on that. Add a function to convert UV system type to bit pattern needed for is_uv_hubless(). Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.814880843@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform/uv: Save OEM_ID from ACPI MADT probeMike Travis2019-10-071-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Save the OEM_ID and OEM_TABLE_ID passed to the apic driver probe function for later use. Also, convert the char list arg passed from the kernel to a true null-terminated string. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.732237241@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| |
| \
| \
| \
*---. \ Merge branches 'core-objtool-for-linus', 'x86-cleanups-for-linus' and ↵Linus Torvalds2019-11-262-12/+15
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 objtool, cleanup, and apic updates from Ingo Molnar: "Objtool: - Fix a gawk 5.0 incompatibility in gen-insn-attr-x86.awk. Most distros are still on gawk 4.2.x. Cleanup: - Misc cleanups, plus the removal of obsolete code such as Calgary IOMMU support, which code hasn't seen any real testing in a long time and there's no known users left. apic: - Two changes: a cleanup and a fix for an (old) race for oneshot threaded IRQ handlers" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Fix awk regexp warnings * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove unused asm/rio.h x86: Fix typos in comments x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h> x86/pci: Remove pci_64.h x86: Remove the calgary IOMMU driver x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments x86/kdump: Remove the unused crash_copy_backup_region() x86/nmi: Remove stale EDAC include leftover * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Rename misnamed functions x86/ioapic: Prevent inconsistent state when moving an interrupt
| | | * x86/ioapic: Rename misnamed functionsThomas Gleixner2019-10-241-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ioapic_irqd_[un]mask() are misnomers as both functions do way more than masking and unmasking the interrupt line. Both deal with the moving the affinity of the interrupt within interrupt context. The mask/unmask is just a tiny part of the functionality. Rename them to ioapic_prepare/finish_move(), fixup the call sites and rename the related variables in the code to reflect what this is about. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20191017101938.412489856@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | | * x86/ioapic: Prevent inconsistent state when moving an interruptThomas Gleixner2019-10-241-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an issue with threaded interrupts which are marked ONESHOT and using the fasteoi handler: if (IS_ONESHOT()) mask_irq(); .... cond_unmask_eoi_irq() chip->irq_eoi(); if (setaffinity_pending) { mask_ioapic(); ... move_affinity(); unmask_ioapic(); } So if setaffinity is pending the interrupt will be moved and then unconditionally unmasked at the ioapic level, which is wrong in two aspects: 1) It should be kept masked up to the point where the threaded handler finished. 2) The physical chip state and the software masked state are inconsistent Guard both the mask and the unmask with a check for the software masked state. If the line is marked masked then the ioapic line is also masked, so both mask_ioapic() and unmask_ioapic() can be skipped safely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Fixes: 3aa551c9b4c4 ("genirq: add threaded interrupt handler support") Link: https://lkml.kernel.org/r/20191017101938.321393687@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/apic, x86/uprobes: Correct parameter names in kernel-doc commentsYi Wang2019-10-271-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename parameter names to the correct ones used in the function. No functional changes. [ bp: Merge two patches into a single one. ] Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1571816442-22494-1-git-send-email-wang.yi59@zte.com.cn
* | | Merge tag 'printk-for-5.5' of ↵Linus Torvalds2019-11-251-22/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow to print symbolic error names via new %pe modifier. - Use pr_warn() instead of the remaining pr_warning() calls. Fix formatting of the related lines. - Add VSPRINTF entry to MAINTAINERS. * tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits) checkpatch: don't warn about new vsprintf pointer extension '%pe' MAINTAINERS: Add VSPRINTF tools lib api: Renaming pr_warning to pr_warn ASoC: samsung: Use pr_warn instead of pr_warning lib: cpu_rmap: Use pr_warn instead of pr_warning trace: Use pr_warn instead of pr_warning dma-debug: Use pr_warn instead of pr_warning vgacon: Use pr_warn instead of pr_warning fs: afs: Use pr_warn instead of pr_warning sh/intc: Use pr_warn instead of pr_warning scsi: Use pr_warn instead of pr_warning platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning platform/x86: asus-laptop: Use pr_warn instead of pr_warning platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning oprofile: Use pr_warn instead of pr_warning of: Use pr_warn instead of pr_warning macintosh: Use pr_warn instead of pr_warning idsn: Use pr_warn instead of pr_warning ide: Use pr_warn instead of pr_warning crypto: n2: Use pr_warn instead of pr_warning ...
| * | | x86: Use pr_warn instead of pr_warningKefeng Wang2019-10-181-22/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of pr_warning"), removing pr_warning so all logging messages use a consistent <prefix>_warn style. Let's do it. Link: http://lkml.kernel.org/r/20191018031850.48498-7-wangkefeng.wang@huawei.com To: linux-kernel@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Robert Richter <rric@kernel.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Shevchenko <andy@infradead.org> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
* | | | x86/apic/32: Avoid bogus LDR warningsJan Beulich2019-11-051-13/+15
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The removal of the LDR initialization in the bigsmp_32 APIC code unearthed a problem in setup_local_APIC(). The code checks unconditionally for a mismatch of the logical APIC id by comparing the early APIC id which was initialized in get_smp_config() with the actual LDR value in the APIC. Due to the removal of the bogus LDR initialization the check now can trigger on bigsmp_32 APIC systems emitting a warning for every booting CPU. This is of course a false positive because the APIC is not using logical destination mode. Restrict the check and the possibly resulting fixup to systems which are actually using the APIC in logical destination mode. [ tglx: Massaged changelog and added Cc stable ] Fixes: bae3a8d3308 ("x86/apic: Do not initialize LDR and DFR for bigsmp") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
* | | x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpuSean Christopherson2019-10-151-1/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check that the per-cpu cluster mask pointer has been set prior to clearing a dying cpu's bit. The per-cpu pointer is not set until the target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the teardown function, x2apic_dead_cpu(), is associated with the earlier CPUHP_X2APIC_PREPARE. If an error occurs before the cpu is awakened, e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference the NULL pointer and cause a panic. smpboot: do_boot_cpu failed(-22) to wakeup CPU#1 BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:x2apic_dead_cpu+0x1a/0x30 Call Trace: cpuhp_invoke_callback+0x9a/0x580 _cpu_up+0x10d/0x140 do_cpu_up+0x69/0xb0 smp_init+0x63/0xa9 kernel_init_freeable+0xd7/0x229 ? rest_init+0xa0/0xa0 kernel_init+0xa/0x100 ret_from_fork+0x35/0x40 Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
* | Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds2019-09-1714-314/+372
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: - Cleanup the apic IPI implementation by removing duplicated code and consolidating the functions into the APIC core. - Implement a safe variant of the IPI broadcast mode. Contrary to earlier attempts this uses the core tracking of which CPUs have been brought online at least once so that a broadcast does not end up in some dead end in BIOS/SMM code when the CPU is still waiting for init. Once all CPUs have been brought up once, IPI broadcasting is enabled. Before that regular one by one IPIs are issued. - Drop the paravirt CR8 related functions as they have no user anymore - Initialize the APIC TPR to block interrupt 16-31 as they are reserved for CPU exceptions and should never be raised by any well behaving device. - Emit a warning when vector space exhaustion breaks the admin set affinity of an interrupt. - Make sure to use the NMI fallback when shutdown via reboot vector IPI fails. The original code had conditions which prevent the code path to be reached. - Annotate various APIC config variables as RO after init. [ The ipi broadcase change came in earlier through the cpu hotplug branch, but I left the explanation in the commit message since it was shared between the two different branches - Linus ] * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86/apic/vector: Warn when vector space exhaustion breaks affinity x86/apic: Annotate global config variables as "read-only after init" x86/apic/x2apic: Implement IPI shorthands support x86/apic/flat64: Remove the IPI shorthand decision logic x86/apic: Share common IPI helpers x86/apic: Remove the shorthand decision logic x86/smp: Enhance native_send_call_func_ipi() x86/smp: Move smp_function_call implementations into IPI code x86/apic: Provide and use helper for send_IPI_allbutself() x86/apic: Add static key to Control IPI shorthands x86/apic: Move no_ipi_broadcast() out of 32bit x86/apic: Add NMI_VECTOR wait to IPI shorthand x86/apic: Remove dest argument from __default_send_IPI_shortcut() x86/hotplug: Silence APIC and NMI when CPU is dead x86/cpu: Move arch_smt_update() to a neutral place x86/apic/uv: Make x2apic_extra_bits static x86/apic: Consolidate the apic local headers x86/apic: Move apic_flat_64 header into apic directory x86/apic: Move ipi header into apic directory x86/apic: Cleanup the include maze ...
| * | x86/apic/vector: Warn when vector space exhaustion breaks affinityNeil Horman2019-08-281-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On x86, CPUs are limited in the number of interrupts they can have affined to them as they only support 256 interrupt vectors per CPU. 32 vectors are reserved for the CPU and the kernel reserves another 22 for internal purposes. That leaves 202 vectors for assignement to devices. When an interrupt is set up or the affinity is changed by the kernel or the administrator, the vector assignment code attempts to honor the requested affinity mask. If the vector space on the CPUs in that affinity mask is exhausted the code falls back to a wider set of CPUs and assigns a vector on a CPU outside of the requested affinity mask silently. While the effective affinity is reflected in the corresponding /proc/irq/$N/effective_affinity* files the silent breakage of the requested affinity can lead to unexpected behaviour for administrators. Add a pr_warn() when this happens so that adminstrators get at least informed about it in the syslog. [ tglx: Massaged changelog and made the pr_warn() more informative ] Reported-by: djuran@redhat.com Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: djuran@redhat.com Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com
| * | x86/apic: Annotate global config variables as "read-only after init"Sean Christopherson2019-08-071-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark the APIC's global config variables that are constant after boot as __ro_after_init to help document that the majority of the APIC config is not changed at runtime, and to harden the kernel a smidge. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190805212134.12001-1-sean.j.christopherson@intel.com
| * | x86/apic/x2apic: Implement IPI shorthands supportThomas Gleixner2019-07-253-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Implement shorthand support for x2apic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105221.134696837@linutronix.de
| * | x86/apic/flat64: Remove the IPI shorthand decision logicThomas Gleixner2019-07-252-50/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Remove the now redundant decision logic in the APIC code and the duplicate helper in probe_64.c. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105221.042964120@linutronix.de
| * | x86/apic: Share common IPI helpersThomas Gleixner2019-07-252-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 64bit implementations need the same wrappers around __default_send_IPI_shortcut() as 32bit. Move them out of the 32bit section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.951534451@linutronix.de
| * | x86/apic: Remove the shorthand decision logicThomas Gleixner2019-07-251-24/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Remove the now redundant decision logic in the 32bit implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.860244707@linutronix.de
| * | x86/smp: Enhance native_send_call_func_ipi()Thomas Gleixner2019-07-251-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nadav noticed that the cpumask allocations in native_send_call_func_ipi() are noticeable in microbenchmarks. Use the new cpumask_or_equal() function to simplify the decision whether the supplied target CPU mask is either equal to cpu_online_mask or equal to cpu_online_mask except for the CPU on which the function is invoked. cpumask_or_equal() or's the target mask and the cpumask of the current CPU together and compares it to cpu_online_mask. If the result is false, use the mask based IPI function, otherwise check whether the current CPU is set in the target mask and invoke either the send_IPI_all() or the send_IPI_allbutselt() APIC callback. Make the shorthand decision also depend on the static key which enables shorthand mode. That allows to remove the extra cpumask comparison with cpu_callout_mask. Reported-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.768238809@linutronix.de
| * | x86/smp: Move smp_function_call implementations into IPI codeThomas Gleixner2019-07-251-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move it where it belongs. That allows to keep all the shorthand logic in one place. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.677835995@linutronix.de
| * | x86/apic: Provide and use helper for send_IPI_allbutself()Thomas Gleixner2019-07-251-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support IPI shorthands wrap invocations of apic->send_IPI_allbutself() in a helper function, so the static key controlling the shorthand mode is only in one place. Fixup all callers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.492691679@linutronix.de
| * | x86/apic: Add static key to Control IPI shorthandsThomas Gleixner2019-07-252-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IPI shorthand functionality delivers IPI/NMI broadcasts to all CPUs in the system. This can have similar side effects as the MCE broadcasting when CPUs are waiting in the BIOS or are offlined. The kernel tracks already the state of offlined CPUs whether they have been brought up at least once so that the CR4 MCE bit is set to make sure that MCE broadcasts can't brick the machine. Utilize that information and compare it to the cpu_present_mask. If all present CPUs have been brought up at least once then the broadcast side effect is mitigated by disabling regular interrupt/IPI delivery in the APIC itself and by the cpu offline check at the begin of the NMI handler. Use a static key to switch between broadcasting via shorthands or sending the IPI/NMI one by one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.386410643@linutronix.de
| * | x86/apic: Move no_ipi_broadcast() out of 32bitThomas Gleixner2019-07-253-29/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the upcoming shorthand support for all APIC incarnations the command line option needs to be available for 64 bit as well. While at it, rename the control variable, make it static and mark it __ro_after_init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.278327940@linutronix.de
| * | x86/apic: Add NMI_VECTOR wait to IPI shorthandThomas Gleixner2019-07-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support NMI shorthand broadcasts add the safe wait for ICR idle for NMI vector delivery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.185838026@linutronix.de
| * | x86/apic: Remove dest argument from __default_send_IPI_shortcut()Thomas Gleixner2019-07-254-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SDM states: "The destination shorthand field of the ICR allows the delivery mode to be by-passed in favor of broadcasting the IPI to all the processors on the system bus and/or back to itself (see Section 10.6.1, Interrupt Command Register (ICR)). Three destination shorthands are supported: self, all excluding self, and all including self. The destination mode is ignored when a destination shorthand is used." So there is no point to supply the destination mode to the shorthand delivery function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.094613426@linutronix.de
| * | x86/hotplug: Silence APIC and NMI when CPU is deadThomas Gleixner2019-07-251-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support IPI/NMI broadcasting via the shorthand mechanism side effects of shorthands need to be mitigated: Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs Neither of those can be handled on unplugged CPUs for obvious reasons. It would be trivial to just fully disable the APIC via the enable bit in MSR_APICBASE. But that's not possible because clearing that bit on systems based on the 3 wire APIC bus would require a hardware reset to bring it back as the APIC would lose track of bus arbitration. On systems with FSB delivery APICBASE could be disabled, but it has to be guaranteed that no interrupt is sent to the APIC while in that state and it's not clear from the SDM whether it still responds to INIT/SIPI messages. Therefore stay on the safe side and switch the APIC into soft disabled mode so it won't deliver any regular vector to the CPU. NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check for the CPU being offline on early nmi entry and if so bail. Note, this cannot use the stop/restart_nmi() magic which is used in the alternatives code. A dead CPU cannot invoke nmi_enter() or anything else due to RCU and other reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de
| * | x86/apic/uv: Make x2apic_extra_bits staticThomas Gleixner2019-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Not used outside of the UV apic source. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.725264153@linutronix.de
| * | x86/apic: Consolidate the apic local headersThomas Gleixner2019-07-2512-119/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now there are three small local headers. Some contain functions which are only used in one source file. Move all the inlines and declarations into a single local header and the inlines which are only used in one source file into that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.618612624@linutronix.de
| * | x86/apic: Move apic_flat_64 header into apic directoryThomas Gleixner2019-07-253-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Only used locally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.526508168@linutronix.de
| * | x86/apic: Move ipi header into apic directoryThomas Gleixner2019-07-258-14/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | Only used locally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.434738036@linutronix.de
| * | x86/apic: Cleanup the include mazeThomas Gleixner2019-07-259-112/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of these APIC files include the world and some more. Remove the unneeded cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.342631201@linutronix.de
| * | x86/apic: Move IPI inlines into ipi.cThomas Gleixner2019-07-251-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | No point in having them in an header file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.252225936@linutronix.de
| * | x86/apic: Make apic_pending_intr_clear() more robustThomas Gleixner2019-07-251-44/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In course of developing shorthand based IPI support issues with the function which tries to clear eventually pending ISR bits in the local APIC were observed. 1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear(). This warning is emitted when the function fails to clear pending ISR bits or observes pending IRR bits which are not delivered to the CPU after the stale ISR bit(s) are ACK'ed. Unfortunately the function only emits a WARN_ON() and fails to dump the IRR/ISR content. That's useless for debugging. Feng added spot on debug printk's which revealed that the stale IRR bit belonged to the APIC timer interrupt vector, but adding ad hoc debug code does not help with sporadic failures in the field. Rework the loop so the full IRR/ISR contents are saved and on failure dumped. 2) The loop termination logic is interesting at best. If the machine has no TSC or cpu_khz is not known yet it tries 1 million times to ack stale IRR/ISR bits. What? With TSC it uses the TSC to calculate the loop termination. It takes a timestamp at entry and terminates the loop when: (rdtsc() - start_timestamp) >= (cpu_hkz << 10) That's roughly one second. Both methods are problematic. The APIC has 256 vectors, which means that in theory max. 256 IRR/ISR bits can be set. In practice this is impossible and the chance that more than a few bits are set is close to zero. With the pure loop based approach the 1 million retries are complete overkill. With TSC this can terminate too early in a guest which is running on a heavily loaded host even with only a couple of IRR/ISR bits set. The reason is that after acknowledging the highest priority ISR bit, pending IRRs must get serviced first before the next round of acknowledge can take place as the APIC (real and virtualized) does not honour EOI without a preceeding interrupt on the CPU. And every APIC read/write takes a VMEXIT if the APIC is virtualized. While trying to reproduce the issue 0-day reported it was observed that the guest was scheduled out long enough under heavy load that it terminated after 8 iterations. Make the loop terminate after 512 iterations. That's plenty enough in any case and does not take endless time to complete. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.158847694@linutronix.de
| * | x86/apic: Soft disable APIC before initializing itThomas Gleixner2019-07-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the APIC was already enabled on entry of setup_local_APIC() then disabling it soft via the SPIV register makes a lot of sense. That masks all LVT entries and brings it into a well defined state. Otherwise previously enabled LVTs which are not touched in the setup function stay unmasked and might surprise the just booting kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.068290579@linutronix.de
| * | x86/apic: Invoke perf_events_lapic_init() after enabling APICThomas Gleixner2019-07-251-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the APIC is soft disabled then unmasking an LVT entry does not work and the write is ignored. perf_events_lapic_init() tries to do so. Move the invocation after the point where the APIC has been enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105218.962517234@linutronix.de
| * | x86/apic: Initialize TPR to block interrupts 16-31Andy Lutomirski2019-07-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The APIC, per spec, is fundamentally confused and thinks that interrupt vectors 16-31 are valid. This makes no sense -- the CPU reserves vectors 0-31 for exceptions (faults, traps, etc). Obviously, no device should actually produce an interrupt with vector 16-31, but robustness can be improved by setting the APIC TPR class to 1, which will prevent delivery of an interrupt with a vector below 32. Note: This is *not* intended as a security measure against attackers who control malicious hardware. Any PCI or similar hardware that can be controlled by an attacker MUST be behind a functional IOMMU that remaps interrupts. The purpose of this change is to reduce the chance that a certain class of device malfunctions crashes the kernel in hard-to-debug ways. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.luto@kernel.org
* | | Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds2019-09-161-10/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu-feature updates from Ingo Molnar: - Rework the Intel model names symbols/macros, which were decades of ad-hoc extensions and added random noise. It's now a coherent, easy to follow nomenclature. - Add new Intel CPU model IDs: - "Tiger Lake" desktop and mobile models - "Elkhart Lake" model ID - and the "Lightning Mountain" variant of Airmont, plus support code - Add the new AVX512_VP2INTERSECT instruction to cpufeatures - Remove Intel MPX user-visible APIs and the self-tests, because the toolchain (gcc) is not supporting it going forward. This is the first, lowest-risk phase of MPX removal. - Remove X86_FEATURE_MFENCE_RDTSC - Various smaller cleanups and fixes * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/cpu: Update init data for new Airmont CPU model x86/cpu: Add new Airmont variant to Intel family x86/cpu: Add Elkhart Lake to Intel family x86/cpu: Add Tiger Lake to Intel family x86: Correct misc typos x86/intel: Add common OPTDIFFs x86/intel: Aggregate microserver naming x86/intel: Aggregate big core graphics naming x86/intel: Aggregate big core mobile naming x86/intel: Aggregate big core client naming x86/cpufeature: Explain the macro duplication x86/ftrace: Remove mcount() declaration x86/PCI: Remove superfluous returns from void functions x86/msr-index: Move AMD MSRs where they belong x86/cpu: Use constant definitions for CPU models lib: Remove redundant ftrace flag removal x86/crash: Remove unnecessary comparison x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE() x86: Remove X86_FEATURE_MFENCE_RDTSC x86/mpx: Remove MPX APIs ...
| * \ \ Merge branch 'linus' into x86/cpu, to resolve conflictsIngo Molnar2019-09-023-23/+13
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: tools/power/x86/turbostat/turbostat.c Recent turbostat changes conflicted with a pending rename of x86 model names in tip:x86/cpu, sort it out. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/intel: Aggregate microserver namingPeter Zijlstra2019-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently big microservers have _XEON_D while small microservers have _X, Make it uniformly: _D. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \ -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
| * | | | x86/intel: Aggregate big core graphics namingPeter Zijlstra2019-08-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently big core clients with extra graphics on have: - _G - _GT3E Make it uniformly: _G for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
| * | | | x86/intel: Aggregate big core mobile namingPeter Zijlstra2019-08-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently big core mobile chips have either: - _L - _ULT - _MOBILE Make it uniformly: _L. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
OpenPOWER on IntegriCloud