summaryrefslogtreecommitdiffstats
path: root/drivers/acpi/numa.c
Commit message (Collapse)AuthorAgeFilesLines
* ACPI: NUMA: Use correct type for printing addresses on i386-PAEChao Fan2019-01-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The addresses of NUMA nodes are not printed correctly on i386-PAE which is misleading. Here is a debian9-32bit with PAE in a QEMU guest having more than 4G of memory: qemu-system-i386 \ -hda /var/lib/libvirt/images/debian32.qcow2 \ -m 5G \ -enable-kvm \ -smp 10 \ -numa node,mem=512M,nodeid=0,cpus=0 \ -numa node,mem=512M,nodeid=1,cpus=1 \ -numa node,mem=512M,nodeid=2,cpus=2 \ -numa node,mem=512M,nodeid=3,cpus=3 \ -numa node,mem=512M,nodeid=4,cpus=4 \ -numa node,mem=512M,nodeid=5,cpus=5 \ -numa node,mem=512M,nodeid=6,cpus=6 \ -numa node,mem=512M,nodeid=7,cpus=7 \ -numa node,mem=512M,nodeid=8,cpus=8 \ -numa node,mem=512M,nodeid=9,cpus=9 \ -serial stdio Because of the wrong value type, it prints as below: [ 0.021049] ACPI: SRAT Memory (0x0 length 0xa0000) in proximity domain 0 enabled [ 0.021740] ACPI: SRAT Memory (0x100000 length 0x1ff00000) in proximity domain 0 enabled [ 0.022425] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 1 enabled [ 0.023092] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 2 enabled [ 0.023764] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 3 enabled [ 0.024431] ACPI: SRAT Memory (0x80000000 length 0x20000000) in proximity domain 4 enabled [ 0.025104] ACPI: SRAT Memory (0xa0000000 length 0x20000000) in proximity domain 5 enabled [ 0.025791] ACPI: SRAT Memory (0x0 length 0x20000000) in proximity domain 6 enabled [ 0.026412] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 7 enabled [ 0.027118] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 8 enabled [ 0.027802] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 9 enabled The upper half of the start address of the NUMA domains between 6 and 9 inclusive was cut, so the printed values are incorrect. Fix the value type, to get the correct values in the log as follows: [ 0.023698] ACPI: SRAT Memory (0x0 length 0xa0000) in proximity domain 0 enabled [ 0.024325] ACPI: SRAT Memory (0x100000 length 0x1ff00000) in proximity domain 0 enabled [ 0.024981] ACPI: SRAT Memory (0x20000000 length 0x20000000) in proximity domain 1 enabled [ 0.025659] ACPI: SRAT Memory (0x40000000 length 0x20000000) in proximity domain 2 enabled [ 0.026317] ACPI: SRAT Memory (0x60000000 length 0x20000000) in proximity domain 3 enabled [ 0.026980] ACPI: SRAT Memory (0x80000000 length 0x20000000) in proximity domain 4 enabled [ 0.027635] ACPI: SRAT Memory (0xa0000000 length 0x20000000) in proximity domain 5 enabled [ 0.028311] ACPI: SRAT Memory (0x100000000 length 0x20000000) in proximity domain 6 enabled [ 0.028985] ACPI: SRAT Memory (0x120000000 length 0x20000000) in proximity domain 7 enabled [ 0.029667] ACPI: SRAT Memory (0x140000000 length 0x20000000) in proximity domain 8 enabled [ 0.030334] ACPI: SRAT Memory (0x160000000 length 0x20000000) in proximity domain 9 enabled Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* mm: remove include/linux/bootmem.hMike Rapoport2018-10-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* acpi, numa: fix pxm to online numa node associationsDan Williams2018-03-151-4/+6
| | | | | | | | | | | | | | | | | | | | | | Commit 99759869faf1 "acpi: Add acpi_map_pxm_to_online_node()" added support for mapping a given proximity to its nearest, by SLIT distance, online node. However, it sometimes returns unexpected results due to the fact that it switches from comparing the PXM node to the last node that was closer than the current max. for_each_online_node(n) { dist = node_distance(node, n); if (dist < min_dist) { min_dist = dist; node = n; <---- from this point we're using the wrong node for node_distance() Fixes: 99759869faf1 ("acpi: Add acpi_map_pxm_to_online_node()") Cc: <stable@vger.kernel.org> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* ACPI / NUMA: ia64: Parse all entries of SRAT memory affinity tableGanapatrao Kulkarni2017-11-271-2/+1
| | | | | | | | | | | | | | | In current implementation, SRAT Memory Affinity Structure table parsing is restricted to number of maximum memblocks allowed (NR_NODE_MEMBLKS). However NR_NODE_MEMBLKS is defined individually as per architecture requirements. Hence removing the restriction of SRAT Memory Affinity Structure parsing in ACPI driver code and let architecture code check for allowed memblocks count. This check is already there in the x86 code, so do the same on ia64. Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI: NUMA: Fix typo in the full name of SRATRoss Zwisler2017-07-241-1/+1
| | | | | | | | To save someone the time of searching the ACPI spec for "Static Resource Affinity Table". Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI/NUMA: Do not map pxm to node when NUMA is turned offBoris Ostrovsky2016-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned off. So acpi_get_node() might return a node > 0, which is fatal when NUMA is disabled as the rest of the kernel assumes that only node 0 exists. Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: linux-ia64@vger.kernel.org Cc: catalin.marinas@arm.com Cc: rjw@rjwysocki.net Cc: will.deacon@arm.com Cc: linux-acpi@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* ACPI / NUMA: Enable ACPI based NUMA on ARM64Hanjun Guo2016-06-221-1/+36
| | | | | | | | | | | | Add function needed for cpu to node mapping, and enable ACPI based NUMA for ARM64 in Kconfig Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> [david.daney@cavium.com added ACPI_NUMA default to y for ARM64] Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: Improve SRAT error detection and add messagesDavid Daney2016-05-301-4/+11
| | | | | | | | | | | | Loosely based on code from Robert Richter and Hanjun Guo. Improve out of range node detection as well as allow for Larger SRAT entities. Add printing of nice messages. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.cHanjun Guo2016-05-301-0/+60
| | | | | | | | | | | | acpi_numa_memory_affinity_init() will be reused by arm64. Move it to drivers/acpi/numa.c to facilitate reuse. No code change. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.cDavid Daney2016-05-301-0/+12
| | | | | | | | | | | | bad_srat() and srat_disabled() are shared by x86 and follow-on arm64 patches. Move them to drivers/acpi/numa.c in preparation for arm64 support. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> [david.daney@cavium.com moved definitions to drivers/acpi/numa.c] Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.cHanjun Guo2016-05-301-0/+29
| | | | | | | | | | | | | | Identical implementations of acpi_numa_slit_init() are used by both x86 and follow-on arm64 support. Move it to drivers/acpi/numa.c, and guard with CONFIG_X86 || CONFIG_ARM64 because ia64 has its own architecture specific implementation. No code change. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 onlyRobert Richter2016-05-301-2/+0
| | | | | | | | | | | Since acpi_numa_arch_fixup() is only used in arch ia64, move it there to make a generic interface easier. This avoids empty function stubs or some complex kconfig options for x86 and arm64. Signed-off-by: Robert Richter <rrichter@cavium.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: remove duplicate NULL checkHanjun Guo2016-05-301-3/+0
| | | | | | | | | | | The argument "header" for acpi_table_print_srat_entry() is always checked before the function is called, it's duplicate to check it again, remove it. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug()Hanjun Guo2016-05-301-38/+21
| | | | | | | | | | | | | | | | | | | | ACPI_DEBUG_PRINT is a bit fragile in acpi/numa.c, the first thing is that component ACPI_NUMA(0x80000000) is not described in the Documentation/acpi/debug.txt, and even not defined in the struct acpi_dlayer acpi_debug_layers which we can not dynamically enable/disable it with /sys/modules/acpi/parameters/debug_layer. another thing is that ACPI_DEBUG_OUTPUT is controlled by ACPICA which not coordinate well with ACPI drivers. Replace ACPI_DEBUG_PRINT() with pr_debug() in this patch as pr_debug will do the same thing for debug purpose and it can make the code much cleaner, also remove the related code which not needed anymore if ACPI_DEBUG_PRINT() is gone. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / NUMA: Use pr_fmt() instead of printkHanjun Guo2016-05-301-10/+7
| | | | | | | | | Just do some cleanups to replace printk with pr_fmt(). Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / SRAT: fix SRAT parsing order with both LAPIC and X2APIC presentLukasz Anaczkowski2016-04-211-4/+12
| | | | | | | | | | | | | | | | | | | | | | | SRAT maps APIC ID to proximity domains ids (PXM). Mapping from PXM to NUMA node ids is based on order of entries in SRAT table. SRAT table has just LAPIC entires or mix of LAPIC and X2APIC entries. As long as there are only LAPIC entires, mapping from proximity domain id to NUMA node id is as assumed by BIOS. However, once APIC entries are mixed, X2APIC entries would be first mapped which causes unexpected NUMA node mapping. To fix that, change parsing to check each entry against both LAPIC and X2APIC so mapping is in the SRAT/PXM order. This is supplemental change to the fix made by commit d81056b5278 (Handle apic/x2apic entries in MADT in correct order) and using the mechanism introduced by 9b3fedd (ACPI / tables: Add acpi_subtable_proc to ACPI table parsers). Fixes: d81056b5278 (Handle apic/x2apic entries in MADT in correct order) Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> [ rjw : Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI: Remove FSF mailing addressesJarkko Nikula2015-07-081-4/+0
| | | | | | | | There is no need to carry potentially outdated Free Software Foundation mailing address in file headers since the COPYING file includes it. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* acpi: Add acpi_map_pxm_to_online_node()Toshi Kani2015-06-261-3/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel initializes CPU & memory's NUMA topology from ACPI SRAT table. Some other ACPI tables, such as NFIT and DMAR, also contain proximity IDs for their device's NUMA topology. This information can be used to improve performance of these devices. This patch introduces acpi_map_pxm_to_online_node(), which is similar to acpi_map_pxm_to_node(), but always returns an online node. When the mapped node from a given proximity ID is offline, it looks up the node distance table and returns the nearest online node. ACPI device drivers, which are called after the NUMA initialization has completed in the kernel, can call this interface to obtain their device NUMA topology from ACPI tables. Such drivers do not have to deal with offline nodes. A node may be offline when a device proximity ID is unique, SRAT memory entry does not exist, or NUMA is disabled, ex. "numa=off" on x86. This patch also moves the pxm range check from acpi_get_node() to acpi_map_pxm_to_node(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()Hanjun Guo2015-02-061-10/+2
| | | | | | | | | | | In acpi_table_parse(), pointer of the table to pass to handler() is checked before handler() called, so remove all the duplicate NULL check in the handler function. CC: Tony Luck <tony.luck@intel.com> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / numa: Use __weak, not the gcc-specific versionBjorn Helgaas2014-02-031-1/+1
| | | | | | Use "__weak" instead of the gcc-specific "__attribute__ ((weak))". Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / numa: Make __acpi_map_pxm_to_node(), acpi_get_pxm() staticBjorn Helgaas2014-02-031-2/+2
| | | | | | | | __acpi_map_pxm_to_node() and acpi_get_pxm() are only used within drivers/acpi/numa.c. This makes them static and removes their declarations. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / numa: Simplify acpi_get_node() styleBjorn Helgaas2014-02-031-4/+4
| | | | | | | Simplify control flow by removing local variable initialization and returning a constant as soon as possible. No functional change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / numa: Fix acpi_get_node() prototypeBjorn Helgaas2014-02-031-1/+1
| | | | | | | acpi_get_node() takes an acpi_handle, not an "acpi_handle *". This fixes the prototype and the definitions. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI: Clean up inclusions of ACPI header filesLv Zheng2013-12-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace direct inclusions of <acpi/acpi.h>, <acpi/acpi_bus.h> and <acpi/acpi_drivers.h>, which are incorrect, with <linux/acpi.h> inclusions and remove some inclusions of those files that aren't necessary. First of all, <acpi/acpi.h>, <acpi/acpi_bus.h> and <acpi/acpi_drivers.h> should not be included directly from any files that are built for CONFIG_ACPI unset, because that generally leads to build warnings about undefined symbols in !CONFIG_ACPI builds. For CONFIG_ACPI set, <linux/acpi.h> includes those files and for CONFIG_ACPI unset it provides stub ACPI symbols to be used in that case. Second, there are ordering dependencies between those files that always have to be met. Namely, it is required that <acpi/acpi_bus.h> be included prior to <acpi/acpi_drivers.h> so that the acpi_pci_root declarations the latter depends on are always there. And <acpi/acpi.h> which provides basic ACPICA type declarations should always be included prior to any other ACPI headers in CONFIG_ACPI builds. That also is taken care of including <linux/acpi.h> as appropriate. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> (drivers/pci stuff) Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> (Xen stuff) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / mm: use NUMA_NO_NODEJianguo Wu2013-09-241-2/+2
| | | | | | | | Use more appropriate NUMA_NO_NODE instead of -1 Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / numa: Fix __init attribute location in slit_valid()Hanjun Guo2013-08-131-1/+1
| | | | | | | __init belongs after the return type on functions, not before it. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* x86, ACPI, mm: Revert movablemem_map supportYinghai Lu2013-03-021-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tim found: WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80() Hardware name: S2600CP sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. smpboot: Booting Node 1, Processors #1 Modules linked in: Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1 Call Trace: set_cpu_sibling_map+0x279/0x449 start_secondary+0x11d/0x1e5 Don Morris reproduced on a HP z620 workstation, and bisected it to commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is ready") It turns out movable_map has some problems, and it breaks several things 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(&numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that.... c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes critical x86 code. It caused x86 guys did not pay attention to find the problem early. Those patches really should be routed via tip/x86/mm. 4. after that commit, following range can not use movable ram: a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed? b. initrd... it will be freed after booting, so it could be on movable... c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G anymore. d. init_mem_mapping: can not put page table high anymore. e. initmem_init: vmemmap can not be high local node anymore. That is not good. If node is hotplugable, the mem related range like page table and vmemmap could be on the that node without problem and should be on that node. We have workaround patch that could fix some problems, but some can not be fixed. So just remove that offending commit and related ones including: f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region().") 01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from SRAT") 27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to the end of node") e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is ready") fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map") 42f47e27e761 ("page_alloc: make movablemem_map have higher priority") 6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes") 34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter") 4d59a75125d5 ("x86: get pg_data_t's memory from other node") Later we should have patches that will make sure kernel put page table and vmemmap on local node ram instead of push them down to node0. Also need to find way to put other kernel used ram to local node ram. Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Don Morris <don.morris@hp.com> Bisected-by: Don Morris <don.morris@hp.com> Tested-by: Don Morris <don.morris@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* acpi, memory-hotplug: parse SRAT before memblock is readyTang Chen2013-02-231-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On linux, the pages used by kernel could not be migrated. As a result, if a memory range is used by kernel, it cannot be hot-removed. So if we want to hot-remove memory, we should prevent kernel from using it. The way now used to prevent this is specify a memory range by movablemem_map boot option and set it as ZONE_MOVABLE. But when the system is booting, memblock will allocate memory, and reserve the memory for kernel. And before we parse SRAT, and know the node memory ranges, memblock is working. And it may allocate memory in ranges to be set as ZONE_MOVABLE. This memory can be used by kernel, and never be freed. So, let's parse SRAT before memblock is called first. And it is early enough. The first call of memblock_find_in_range_node() is in: setup_arch() |-->setup_real_mode() so, this patch add a function early_parse_srat() to parse SRAT, and call it before setup_real_mode() is called. NOTE: 1) early_parse_srat() is called before numa_init(), and has initialized numa_meminfo. So DO NOT clear numa_nodes_parsed in numa_init() and DO NOT zero numa_meminfo in numa_init(), otherwise we will lose memory numa info. 2) I don't know why using count of memory affinities parsed from SRAT as a return value in original acpi_numa_init(). So I add a static variable srat_mem_cnt to remember this count and use it as the return value of the new acpi_numa_init() [mhocko@suse.cz: parse SRAT before memblock is ready fix] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'acpi-assorted'Rafael J. Wysocki2013-02-151-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | * acpi-assorted: ACPI: Add DMI entry for Sony VGN-FW41E_H ACPI: fix obsolete comment in custom_method.c ACPI / thermal: Use mode to enable/disable kernel thermal processing ACPI thermal: remove unnecessary newline from exception message ACPI sysfs: remove unnecessary newline from exception ACPI video: remove unnecessary newline from error messages ACPI: SRAT: report non-volatile memory in debug ACPI: Rework acpi_get_child() to be more efficient
| * ACPI: SRAT: report non-volatile memory in debugDavidlohr Bueso2013-01-261-2/+4
| | | | | | | | | | | | | | | | Just as with the other memory affinity flags, report non-volatile memory with ACPI debug. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | ACPICA: Cleanup table handler naming conflicts.Lv Zheng2013-01-111-1/+1
|/ | | | | | | | | | | | | | | This is a cosmetic patch only. Comparison of the resulting binary showed only line number differences. This patch does not affect the generation of the Linux binary. This patch decreases 44 lines of 20121114 divergence.diff. There are naming conflicts between Linux and ACPICA on table handlers. This patch cleans up this conflicts to reduce the source code diff between Linux and ACPICA. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI: Only count valid srat memory structuresThomas Renninger2012-08-031-3/+5
| | | | | | | | | | | | | | | | Otherwise you could run into: WARN_ON in numa_register_memblks(), because node_possible_map is zero References: https://bugzilla.novell.com/show_bug.cgi?id=757888 On this machine (ProLiant ML570 G3) the SRAT table contains: - No processor affinities - One memory affinity structure (which is set disabled) CC: Per Jessen <per@opensuse.org> CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
* ACPI: Untangle a return statement for better readabilityThomas Renninger2012-08-031-2/+4
| | | | | | | No functional change. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
* ACPI: Store SRAT table revisionKurt Garloff2012-01-171-0/+6
| | | | | | | | | | | | | | In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides 32bits for these. The new fields were reserved before. According to the ACPI spec, the OS must disregrard reserved fields. In order to know whether or not, we must know what version the SRAT table has. This patch stores the SRAT table revision for later consumption by arch specific __init functions. Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com>
* x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return ↵Tejun Heo2011-02-161-3/+6
| | | | | | | | | | | | | | | | | | | values The functions used during NUMA initialization - *_numa_init() and *_scan_nodes() - have different arguments and return values. Unify them such that they all take no argument and return 0 on success and -errno on failure. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
* x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.cTony Luck2011-01-121-6/+2
| | | | | | | | | | | | | | | | | As pointed out by Linus CONFIG_X86 in drivers/acpi/numa.c is ugly. Builds and boots on ia64 (both normally and with maxcpus=8 to limit the number of cpus). Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <4D2D6B5D.4080208@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86, acpi: Parse all SRAT cpu entries even above the cpu number limitationYinghai Lu2010-12-231-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent Intel new system have different order in MADT, aka will list all thread0 at first, then all thread1. But SRAT table still old order, it will list cpus in one socket all together. If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed to put some cpus apic id to node mapping into apicid_to_node[]. for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash... [ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS). [ 9.235021] divide error: 0000 [#1] SMP [ 9.235315] last sysfs file: [ 9.235481] CPU 1 [ 9.235592] Modules linked in: [ 9.245398] [ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800 [ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 ... [ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 [ 9.665356] RSP <ffff88103f8d1c40> [ 9.665568] ---[ end trace 2296156d35fdfc87 ]--- So let just parse all cpu entries in SRAT. Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of apicid_to_node[]. it fixes following bug too. https://bugzilla.kernel.org/show_bug.cgi?id=22662 -v2: expand to 32bit according to hpa need to add MAX_LOCAL_APIC for 32bit Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Tested-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD486.9020704@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* gcc-4.6: ACPI: fix unused but set variables in ACPIAndi Kleen2010-08-151-3/+1
| | | | | | | | | Some minor improvements in error handling, but overall it was mostly dead code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
* ACPI: NUMA: map pxms to low node idsDavid Rientjes2010-04-041-2/+4
| | | | | | | | | | | pxms are mapped to low node ids to maintain generic kernel use of functions such as pxm_to_node() that are used to determine device affinity. Otherwise, there is no pxm-to-node and node-to-pxm matching rule for x86_64 users of NUMA emulation where a single pxm may be bound to multiple NUMA nodes. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Len Brown <len.brown@intel.com>
* smp: Use nr_cpus= to set nr_cpu_ids earlyYinghai Lu2010-02-171-2/+2
| | | | | | | | | | | | | | | | | | | On x86, before prefill_possible_map(), nr_cpu_ids will be NR_CPUS aka CONFIG_NR_CPUS. Add nr_cpus= to set nr_cpu_ids. so we can simulate cpus <=8 are installed on normal config. -v2: accordging to Christoph, acpi_numa_init should use nr_cpu_ids in stead of NR_CPUS. -v3: add doc in kernel-parameters.txt according to Andrew. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-34-git-send-email-yinghai@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Tony Luck <tony.luck@intel.com>
* Merge branch 'misc-2.6.33' into releaseLen Brown2009-12-161-15/+6
|\
| * ACPI: remove NID_INVALDavid Rientjes2009-12-161-15/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NUMA_NO_NODE has been exported globally and thus it can replace NID_INVAL in the acpi code. Also removes the unused acpi_unmap_pxm_to_node() function. [akpm@linux-foundation.org: coding-style fixes] Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
* | x86: Export srat physical topologyDavid Rientjes2009-10-121-4/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | | | This is the counterpart to "x86: export k8 physical topology" for SRAT. It is not as invasive because the acpi code already seperates node setup into detection and registration steps, with the exception of registering e820 active regions in acpi_numa_memory_affinity_init(). This is now moved to acpi_scan_nodes() if NUMA emulation is disabled or deferred. acpi_numa_init() now returns a value which specifies whether an underlying SRAT was located. If so, that topology can be used by the emulation code to interleave emulated nodes over physical nodes or to register the nodes for ACPI. acpi_get_nodes() may now be used to export the srat physical topology of the machine for NUMA emulation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* ACPI: Move definition of PREFIX from acpi_bus.h to internal..hLen Brown2009-08-281-0/+2
| | | | | | | | | | | | | Linux/ACPI core files using internal.h all PREFIX "ACPI: ", however, not all ACPI drivers use/want it -- and they should not have to #undef PREFIX to define their own. Add GPL commment to internal.h while we are there. This does not change any actual console output, asside from a whitespace fix. Signed-off-by: Len Brown <len.brown@intel.com>
* x86, ACPI: add support for x2apic ACPI extensionsSuresh Siddha2009-04-031-1/+45
| | | | | | | | | | | | | | | | | | All logical processors with APIC ID values of 255 and greater will have their APIC reported through Processor X2APIC structure (type-9 entry type) and all logical processors with APIC ID less than 255 will have their APIC reported through legacy Processor Local APIC (type-0 entry type) only. This is the same case even for NMI structure reporting. The Processor X2APIC Affinity structure provides the association between the X2APIC ID of a logical processor and the proximity domain to which the logical processor belongs. For OSPM, Procssor IDs outside the 0-254 range are to be declared as Device() objects in the ACPI namespace. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* acpi: check for pxm_to_node_map overflowCyrill Gorcunov2009-03-161-1/+1
| | | | | | | | | It is hardly (if ever) possible but in case of broken _PXM entry we could reach out of pxm_to_node_map array bounds in acpi_map_pxm_to_node() call. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
* ACPI: remove private acpica headers from driver filesLin Ming2008-12-311-1/+0
| | | | | | | External driver files should not include any private acpica headers. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* ACPI: Change acpi_evaluate_integer to support 64-bit on 32-bit kernelsMatthew Wilcox2008-10-111-1/+1
| | | | | | | | | | | | | As of version 2.0, ACPI can return 64-bit integers. The current acpi_evaluate_integer only supports 64-bit integers on 64-bit platforms. Change the argument to take a pointer to an acpi_integer so we support 64-bit integers on all platforms. lenb: replaced use of "acpi_integer" with "unsigned long long" lenb: fixed bug in acpi_thermal_trips_update() Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* ACPICA: Update DMAR and SRAT table definitionsBob Moore2008-07-161-2/+2
| | | | | | | | | Synchronized tables with current specifications. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* ACPI: handle invalid ACPI SLIT tableFenghua Yu2008-06-111-4/+27
| | | | | | | | | | | | This is a SLIT sanity checking patch. It moves slit_valid() function to generic ACPI code and does sanity checking for both x86 and ia64. It sets up node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid SLIT table on ia64. It also cleans up unused variable localities in acpi_parse_slit() on x86. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
OpenPOWER on IntegriCloud