summaryrefslogtreecommitdiffstats
path: root/arch/x86/mm
Commit message (Collapse)AuthorAgeFilesLines
* x86, efi: Do not reserve boot services regions within reserved areasMaarten Lankhorst2011-06-181-2/+2
| | | | | | | | | | | | | | | | | Commit 916f676f8dc started reserving boot service code since some systems require you to keep that code around until SetVirtualAddressMap is called. However, in some cases those areas will overlap with reserved regions. The proper medium-term fix is to fix the bootloader to prevent the conflicts from occurring by moving the kernel to a better position, but the kernel should check for this possibility, and only reserve regions which can be reserved. Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: Move do_page_fault()'s error path under unlikely()KOSAKI Motohiro2011-05-261-15/+20
| | | | | | | | | | | | | | | | | | Ingo suggested SIGKILL check should be moved into slowpath function. This will reduce the page fault fastpath impact of this recent commit: 37b23e0525d3: x86,mm: make pagefault killable Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: minchan.kim@gmail.com Cc: willy@linux.intel.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4DDE0B5C.9050907@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
* mm: Convert i_mmap_lock to a mutexPeter Zijlstra2011-05-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: now that all old mmu_gather code is gone, remove the storagePeter Zijlstra2011-05-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | Fold all the mmu_gather rework patches into one for submission Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86,mm: make pagefault killableKOSAKI Motohiro2011-05-251-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an oom killing occurs, almost all processes are getting stuck at the following two points. 1) __alloc_pages_nodemask 2) __lock_page_or_retry 1) is not very problematic because TIF_MEMDIE leads to an allocation failure and getting out from page allocator. 2) is more problematic. In an OOM situation, zones typically don't have page cache at all and memory starvation might lead to greatly reduced IO performance. When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly, meaning that a fork bomb may create new process quickly rather than the oom-killer killing it. Then, the system may become livelocked. This patch makes the pagefault interruptible by SIGKILL. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2011-05-231-11/+3
|\ | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Eliminate various 'set but not used' warnings x86, SMEP: Fix section mismatch warnings x86, amd: Use _safe() msr access for GartTlbWlk disable code
| * x86: Eliminate various 'set but not used' warningsGustavo F. Padovan2011-05-211-11/+3
| | | | | | | | | | | | | | | | Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Cc: Joerg Roedel <joerg.roedel@amd.com> (supporter:AMD IOMMU (AMD-VI)) Cc: iommu@lists.linux-foundation.org (open list:AMD IOMMU (AMD-VI)) Link: http://lkml.kernel.org/r/1305918786-7239-3-git-send-email-padovan@profusion.mobi Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sanitize <linux/prefetch.h> usageLinus Torvalds2011-05-201-0/+1
|/ | | | | | | | | | | | | | | | | | | | | Commit e66eed651fd1 ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]') grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2011-05-1912-1342/+696
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits) x86, mm: Allow ZONE_DMA to be configurable x86, NUMA: Trim numa meminfo with max_pfn in a separate loop x86, NUMA: Rename setup_node_bootmem() to setup_node_data() x86, NUMA: Enable emulation on 32bit too x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too x86, NUMA: Rename amdtopology_64.c to amdtopology.c x86, NUMA: Make numa_init_array() static x86, NUMA: Make 32bit use common NUMA init path x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() x86-32, NUMA: Add @start and @end to init_alloc_remap() x86, NUMA: Remove long 64bit assumption from numa.c x86, NUMA: Enable build of generic NUMA init code on 32bit x86, NUMA: Move NUMA init logic from numa_64.c to numa.c x86-32, NUMA: Update numaq to use new NUMA init protocol x86-32, NUMA: Replace srat_32.c with srat.c x86-32, NUMA: implement temporary NUMA init shims x86, NUMA: Move numa_nodes_parsed to numa.[hc] x86-32, NUMA: Move get_memcfg_numa() into numa_32.c x86, NUMA: make srat.c 32bit safe x86, NUMA: rename srat_64.c to srat.c ...
| * x86, mm: Allow ZONE_DMA to be configurableDavid Rientjes2011-05-162-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ZONE_DMA is unnecessary for a large number of machines that do not require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI cards with a restricted DMA address mask. This patch allows users to disable ZONE_DMA for x86 if they know they will not be using such devices with their kernel. This prevents the VM from unnecessarily reserving a ratio of memory (defaulting to 1/256th of system capacity) with lowmem_reserve_ratio for such allocations when it will never be used. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * x86, NUMA: Trim numa meminfo with max_pfn in a separate loopYinghai Lu2011-05-021-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During testing 32bit numa unifying code from tj, found one system with more than 64g fails to use numa. It turns out we do not trim numa meminfo correctly against max_pfn in case start address of a node is higher than 64GiB. Bug fix made it to tip tree. This patch moves the checking and trimming to a separate loop. So we don't need to compare low/high in following merge loops. It makes the code more readable. Also it makes the node merge printouts less strange. On a 512GiB numa system with 32bit, before: > NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000) > NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000) after: > NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000) > NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000) Signed-off-by: Yinghai Lu <yinghai@kernel.org> [Updated patch description and comment slightly.] Signed-off-by: Tejun Heo <tj@kernel.org>
| * x86, NUMA: Rename setup_node_bootmem() to setup_node_data()Yinghai Lu2011-05-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | After using memblock to replace bootmem, that function only sets up node_data now. Change the name to reflect what it actually does. tj: Minor adjustment to the patch description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
| * x86, NUMA: Enable emulation on 32bit tooTejun Heo2011-05-021-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that NUMA init path is unified, NUMA emulation can be enabled on 32bit. Make numa_emluation.c safe on 32bit by doing the followings. * Define MAX_DMA32_PFN on 32bit too. * Include bootmem.h for max_pfn declaration. * Use u64 explicitly and always use PFN_PHYS() when converting page number to address. * Avoid __udivdi3() generation on 32bit by doing number of pages calculation instead in split_nodes_interleave(). And drop X86_64 dependency from Kconfig. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit tooTejun Heo2011-05-021-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that NUMA init path is unified, amdtopology can be enabled on 32bit. Make amdtopology.c safe on 32bit by explicitly using u64 and drop X86_64 dependency from Kconfig. Inclusion of bootmem.h is added for max_pfn declaration. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Rename amdtopology_64.c to amdtopology.cTejun Heo2011-05-022-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | amdtopology is going to be used by 32bit too drop _64 suffix. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Make numa_init_array() staticTejun Heo2011-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | numa_init_array() no longer has users outside of numa.c. Make it static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Make 32bit use common NUMA init pathTejun Heo2011-05-022-234/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With both _numa_init() methods converted and the rest of init code adjusted, numa_32.c now can switch from the 32bit only init code to the common one in numa.c. * Shim get_memcfg_*()'s are dropped and initmem_init() calls x86_numa_init(), which is updated to handle NUMAQ. * All boilerplate operations including node range limiting, pgdat alloc/init are handled by numa_init(). 32bit only implementation is removed. * 32bit numa_add_memblk(), numa_set_distance() and memory_add_physaddr_to_nid() removed and common versions in numa_32.c enabled for 32bit. This change causes the following behavior changes. * NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized for 32bit too. * Much more sanity checks and configuration cleanups. * Proper handling of node distances. * The same NUMA init messages as 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Initialize and use remap allocator from setup_node_bootmem()Tejun Heo2011-05-023-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setup_node_bootmem() is taken from 64bit and doesn't use remap allocator. It's about to be shared with 32bit so add support for it. If NODE_DATA is remapped, it's noted in the debug message and node locality check is skipped as the __pa() of the remapped address doesn't reflect the actual physical address. On 64bit, remap allocator becomes noop and doesn't affect the behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-32, NUMA: Add @start and @end to init_alloc_remap()Tejun Heo2011-05-021-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of dereferencing node_start/end_pfn[] directly, make init_alloc_remap() take @start and @end and let the caller be responsible for making sure the range is sane. This is to prepare for use from unified NUMA init code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Remove long 64bit assumption from numa.cTejun Heo2011-05-021-23/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Code moved from numa_64.c has assumption that long is 64bit in several places. This patch removes the assumption by using {s|u}64_t explicity, using PFN_PHYS() for page number -> addr conversions and adjusting printf formats. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Enable build of generic NUMA init code on 32bitTejun Heo2011-05-022-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic NUMA init code was moved to numa.c from numa_64.c but is still guaraded by CONFIG_X86_64. This patch removes the compile guard and enables compiling on 32bit. * numa_add_memblk() and numa_set_distance() clash with the shim implementation in numa_32.c and are left out. * memory_add_physaddr_to_nid() clashes with 32bit implementation and is left out. * MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32. * node_data definition in numa_32.c removed in favor of the one in numa.c. There are places where ulong is assumed to be 64bit. The next patch will fix them up. Note that although the code is compiled it isn't used yet and this patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Move NUMA init logic from numa_64.c to numa.cTejun Heo2011-05-023-505/+523
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the generic 64bit NUMA init machinery from numa_64.c to numa.c. * node_data[], numa_mem_info and numa_distance * numa_add_memblk[_to](), numa_remove_memblk[_from]() * numa_set_distance() and friends * numa_init() and all the numa_meminfo handling helpers called from it * dummy_numa_init() * memory_add_physaddr_to_nid() A new function x86_numa_init() is added and the content of numa_64.c::initmem_init() is moved into it. initmem_init() now simply calls x86_numa_init(). Constants and numa_off declaration are moved from numa_{32|64}.h to numa.h. This is code reorganization and doesn't involve any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-32, NUMA: Update numaq to use new NUMA init protocolTejun Heo2011-05-021-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update numaq such that it calls numa_add_memblk() and sets numa_nodes_parsed instead of directly diddling with NUMA states. The original get_memcfg_numaq() is renamed to numaq_numa_init() and new get_memcfg_numaq() is created in numa_32.c. The shim numa_add_memblk() implementation handles node_start/end_pfn[] and node_set_online() for nodes with memory. The new get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other than calling the numaq init function. Things get_memcfgs_numaq() do are not strictly necessary for numaq but added for consistency and to help unifying NUMA init handling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-32, NUMA: Replace srat_32.c with srat.cTejun Heo2011-05-023-285/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SRAT support implementation in srat_32.c and srat.c are generally similar; however, there are some differences. First of all, 64bit implementation supports more types of SRAT entries. 64bit supports x2apic, affinity, memory and SLIT. 32bit only supports processor and memory. Most other differences stem from different initialization protocols employed by 64bit and 32bit NUMA init paths. On 64bit, * Mappings among PXM, node and apicid are directly done in each SRAT entry callback. * Memory affinity information is passed to numa_add_memblk() which takes care of all interfacing with NUMA init. * Doesn't directly initialize NUMA configurations. All the information is recorded in numa_nodes_parsed and memblks. On 32bit, * Checks numa_off. * Things go through one more level of indirection via private tables but eventually end up initializing the same mappings. * node_start/end_pfn[] are initialized and memblock_x86_register_active_regions() is called for each memory chunk. * node_set_online() is called for each online node. * sort_node_map() is called. There are also other minor differences in sanity checking and messages but taking 64bit version should be good enough. This patch drops the 32bit specific implementation and makes the 64bit implementation common for both 32 and 64bit. The init protocol differences are dealt with in two places - the numa_add_memblk() shim added in the previous patch and new temporary numa_32.c:get_memcfg_from_srat() which wraps invocation of x86_acpi_numa_init(). The shim numa_add_memblk() handles the folowings. * node_start/end_pfn[] initialization. * node_set_online() for memory nodes. * Invocation of memblock_x86_register_active_regions(). The shim get_memcfg_from_srat() handles the followings. * numa_off check. * node_set_online() for CPU nodes. * sort_node_map() invocation. * Clearing of numa_nodes_parsed and active_ranges on failure. The shims are temporary and will be removed as the generic NUMA init path in 32bit is replaced with 64bit one. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-32, NUMA: implement temporary NUMA init shimsTejun Heo2011-05-021-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To help transition to common NUMA init, implement temporary 32bit shims for numa_add_memblk() and numa_set_distance(). numa_add_memblk() registers the memblk and adjusts node_start/end_pfn[]. numa_set_distance() is noop. These shims will allow using 64bit NUMA init functions on 32bit and gradual transition to common NUMA init path. For detailed description, please read description of commits which make use of the shim functions. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Move numa_nodes_parsed to numa.[hc]Tejun Heo2011-05-022-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for NUMA init path unification. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-32, NUMA: Move get_memcfg_numa() into numa_32.cTejun Heo2011-05-021-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason get_memcfg_numa() to be implemented inline in mmzone_32.h. Move it to numa_32.c and also make get_memcfg_numa_flag() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: make srat.c 32bit safeTejun Heo2011-05-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Make srat.c 32bit safe by removing the assumption that unsigned long is 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: rename srat_64.c to srat.cTejun Heo2011-05-022-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | Rename srat_64.c to srat.c. This is to prepare for unification of NUMA init paths between 32 and 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: trivial cleanupsTejun Heo2011-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Kill no longer used struct bootnode. * Kill dangling declaration of pxm_to_nid() in numa_32.h. * Make setup_node_bootmem() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-32, NUMA: use sparse_memory_present_with_active_regions()Tejun Heo2011-05-023-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of calling memory_present() for each region from NUMA init, call sparse_memory_present_with_active_regions() from paging_init() similarly to x86-64. For flat and numaq, this results in exactly the same memory_present() calls. For srat, if there are multiple memory chunks for a node, after this change, memory_present() will be called separately for each chunk instead of being called once to encompass the whole range, which doesn't cause any harm and actually is the better behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86, NUMA: Unify 32/64bit numa_cpu_node() implementationTejun Heo2011-05-023-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is NUMAQ which returns valid mapping only after CPU is initialized during SMP bringup; thus, the previous patch to set apicid -> node in setup_local_APIC() makes __apicid_to_node[] always contain the correct mapping whether custom apic->x86_32_numa_cpu_node() is used or not. So, there is no reason to keep separate 32bit implementation. We can always consult __apicid_to_node[]. Move 64bit implementation from numa_64.c to numa.c and remove 32bit implementation from numa_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-64, NUMA: simplify nodedata allocationTejun Heo2011-05-021-36/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With top-down memblock allocation, the allocation range limits in ealry_node_mem() can be simplified - try node-local first, then any node but in any case don't allocate below DMA limit. Remove early_node_mem() and implement simplified allocation directly in setup_node_bootmem(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-64, NUMA: trivial cleanups for setup_node_bootmem()Tejun Heo2011-05-021-29/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the following trivial changes in preparation for further updates. * nodeid -> nid, nid -> tnid * use nd_ prefix for nodedata related variables * remove start/end_pfn and use start/end directly Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * x86-64, NUMA: Simplify hotadd memory handlingTejun Heo2011-05-023-86/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only special handling NUMA needs to do for hotadd memory is determining the node for the hotadd memory given the address of it and there's nothing specific to specific config method used. srat_64.c does somewhat elaborate error checking on ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements memory_add_physaddr_to_nid() which determines the node for given hotadd address. This is almost completely redundant. All the information is already available to the generic NUMA code which already performs all the sanity checking and merging. All that's necessary is not using __initdata from numa_meminfo and providing a function which uses it to map address to node. Drop the specific implementation from srat_64.c and add generic memory_add_physaddr_to_nid() in numa_64.c, which is enabled if CONFIG_MEMORY_HOTPLUG is set. Other than dropping the code, srat_64.c doesn't need any change as it already calls numa_add_memblk() for hot pluggable regions which is enough. While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the same. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * Merge branch 'x86/urgent' into x86-mmTejun Heo2011-05-024-34/+23
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up the following two fix commits. 2be19102b7: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() 765af22da8: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Scheduled NUMA init 32/64bit unification changes depend on these. Signed-off-by: Tejun Heo <tj@kernel.org>
| * \ Merge branch 'x86/numa' into x86-mmTejun Heo2011-05-022-158/+111
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up x86-32 remap allocator cleanup changes - 14 commits, 3fe14ab541^..993ba1585c. 3fe14ab541: x86-32, numa: Fix failure condition check in alloc_remap() 993ba1585c: x86-32, numa: Update remap allocator comments Scheduled NUMA init 32/64bit unification changes depend on them. Signed-off-by: Tejun Heo <tj@kernel.org>
| | * | x86-32, numa: Update remap allocator commentsTejun Heo2011-04-061-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that remap allocator is cleaned up, update comments such that they are in docbook function description format and reflect the actual implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-15-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Remove redundant node_remap_size[]Tejun Heo2011-04-061-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remap area size can be determined from node_remap_start_vaddr[] and node_remap_end_vaddr[] making node_remap_size[] redundant. Remove it. While at it, make resume_map_numa_kva() use @nr_pages for number of pages instead of @size. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-14-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Remove now useless node_remap_offset[]Tejun Heo2011-04-061-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With lowmem address reservation moved into init_alloc_remap(), node_remap_offset[] is no longer useful. Remove it and related offset handling code. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-13-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Make pgdat allocation use alloc_remap()Tejun Heo2011-04-061-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pgdat allocation is handled differnetly from other remap allocations - it's reserved during initialization. There's no reason to handle this any differnetly. Remap allocator is initialized for every node and if init failed the allocation will fail and pgdat allocation can fall back to generic code like anyone else. Remove special init-time pgdat reservation and make allocate_pgdat() use alloc_remap() like everyone else. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-12-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Move remapping for remap allocator into init_alloc_remap()Tejun Heo2011-04-061-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to perform the actual remapping separately. Collapse remap_numa_kva() into init_alloc_remap() and, while at it, make it less verbose. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-11-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Move lowmem address space reservation to init_alloc_remap()Tejun Heo2011-04-061-57/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remap alloc init is done in the following stages. 1. init_alloc_remap() calculates how much memory is necessary for each node and reserves node local memory. 2. initmem_init() collects how much each node needs and reserves a single contiguous lowmem area which can contain all. 3. init_remap_allocator() initializes allocator parameters from the determined lowmem address and per-node offsets. 4. Actual remap happens. There is no reason for the lowmem remap area to be reserved as a single contiguous area at one go. They don't interact with each other and the memblock allocator will put them side-by-side anyway. This patch breaks up the single lowmem address reservation and put per-node lowmem address reservation into init_alloc_remap() and initializes allocator parameters directly in the function as all the addresses are determined there. This merges steps 2 and 3 into 1. While at it, remove now largely irrelevant comments in init_alloc_remap(). This change causes the following behavior changes. * Remap lowmem areas are allocated in smaller per-node chunks. * Remap lowmem area reservation failure fail future remap allocations instead of panicking. * Remap allocator initialization is less verbose. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-10-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Make init_alloc_remap() less panickyTejun Heo2011-04-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remap allocator failure isn't fatal. The callers are required to fall back to regular early memory allocation mechanisms on failure anyway, so there's no reason to panic on remap init failure. Whining and returning are enough. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-9-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Calculate remap size in common codeTejun Heo2011-04-062-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only pgdat and memmap use remap area and there isn't much benefit in allowing per-node override. In addition, the use of node_remap_size[] is confusing in that it contains number of bytes before remap initialization and then number of pages afterwards. Move remap size calculation for memap from specific NUMA config implementations to init_alloc_remap() and make node_remap_size[] static. The only behavior difference is that, before this patch, numaq_32 didn't consider max_pfn when calculating the memmap size but it's enforced after this patch, which is the right thing to do. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-8-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Make @size in init_aloc_remap() represent bytesTejun Heo2011-04-061-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | @size variable in init_alloc_remap() is confusing in that it starts as number of bytes as its name implies and then becomes number of pages. Make it consistently represent bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-7-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Rename @node_kva to @node_pa in init_alloc_remap()Tejun Heo2011-04-061-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | init_alloc_remap() is about to do more and using _kva suffix for physical address becomes confusing because the function will be handling both physical and virtual addresses. Rename @node_kva to @node_pa. This is trivial rename and doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-6-git-send-email-tj@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Reorganize calculate_numa_remap_page()Tejun Heo2011-04-061-63/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate the outer node walking loop and per-node logic from calculate_numa_remap_pages(). The outer loop is collapsed into initmem_init() and the per-node logic is moved into a new function - init_alloc_remap(). The new function name is confusing with the existing init_remap_allocator() and the behavior is the function isn't very clean either at this point, but this is to prepare for further cleanups and it will become prettier. This function doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-5-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Remove redundant top-down alloc code from remap initializationTejun Heo2011-04-061-29/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memblock_find_in_range() now does top-down allocation by default, so there's no reason for its callers to explicitly implement it by gradually lowering the start address. Remove redundant top-down allocation logic from init_meminit() and calculate_numa_remap_pages(). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-4-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | x86-32, numa: Align pgdat size while initializing alloc_remapTejun Heo2011-04-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pgdat is reserved in init_remap_allocator(), PAGE_SIZE aligned size will be used. Match the size alignment in initialization to avoid allocation failure down the road. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-3-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
OpenPOWER on IntegriCloud