summaryrefslogtreecommitdiffstats
path: root/include/linux/slub_def.h
Commit message (Collapse)AuthorAgeFilesLines
* mm: slub: share object_err functionAndrey Ryabinin2015-02-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove static and add function declarations to linux/slub_def.h so it could be used by kernel address sanitizer. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: slub: introduce virt_to_obj functionAndrey Ryabinin2015-02-131-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | virt_to_obj takes kmem_cache address, address of slab page, address x pointing somewhere inside slab object, and returns address of the beginning of object. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* slab: embed memcg_cache_params to kmem_cacheVladimir Davydov2015-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kmem_cache stores a pointer to struct memcg_cache_params instead of embedding it. The rationale is to save memory when kmem accounting is disabled. However, the memcg_cache_params has shrivelled drastically since it was first introduced: * Initially: struct memcg_cache_params { bool is_root_cache; union { struct kmem_cache *memcg_caches[0]; struct { struct mem_cgroup *memcg; struct list_head list; struct kmem_cache *root_cache; bool dead; atomic_t nr_pages; struct work_struct destroy; }; }; }; * Now: struct memcg_cache_params { bool is_root_cache; union { struct { struct rcu_head rcu_head; struct kmem_cache *memcg_caches[0]; }; struct { struct mem_cgroup *memcg; struct kmem_cache *root_cache; }; }; }; So the memory saving does not seem to be a clear win anymore. OTOH, keeping a pointer to memcg_cache_params struct instead of embedding it results in touching one more cache line on kmem alloc/free hot paths. Besides, it makes linking kmem caches in a list chained by a field of struct memcg_cache_params really painful due to a level of indirection, while I want to make them linked in the following patch. That said, let us embed it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* slub: use sysfs'es release mechanism for kmem_cacheChristoph Lameter2014-05-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debugobjects warning during netfilter exit: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984 Workqueue: netns cleanup_net Call Trace: dump_stack+0x52/0x87 warn_slowpath_common+0x8c/0xc0 warn_slowpath_fmt+0x46/0x50 debug_print_object+0x8d/0xb0 __debug_check_no_obj_freed+0xa5/0x220 debug_check_no_obj_freed+0x15/0x20 kmem_cache_free+0x197/0x340 kmem_cache_destroy+0x86/0xe0 nf_conntrack_cleanup_net_list+0x131/0x170 nf_conntrack_pernet_exit+0x5d/0x70 ops_exit_list+0x5e/0x70 cleanup_net+0xfb/0x1c0 process_one_work+0x338/0x550 worker_thread+0x215/0x350 kthread+0xe7/0xf0 ret_from_fork+0x7c/0xb0 Also during dcookie cleanup: WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408 Call Trace: dump_stack (lib/dump_stack.c:52) warn_slowpath_common (kernel/panic.c:430) warn_slowpath_fmt (kernel/panic.c:445) debug_print_object (lib/debugobjects.c:262) __debug_check_no_obj_freed (lib/debugobjects.c:697) debug_check_no_obj_freed (lib/debugobjects.c:726) kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717) kmem_cache_destroy (mm/slab_common.c:363) dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343) event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153) __fput (fs/file_table.c:217) ____fput (fs/file_table.c:253) task_work_run (kernel/task_work.c:125 (discriminator 1)) do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751) int_signal (arch/x86/kernel/entry_64.S:807) Sysfs has a release mechanism. Use that to release the kmem_cache structure if CONFIG_SYSFS is enabled. Only slub is changed - slab currently only supports /proc/slabinfo and not /sys/kernel/slab/*. We talked about adding that and someone was working on it. [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build] [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* slub: rework sysfs layout for memcg cachesVladimir Davydov2014-04-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we try to arrange sysfs entries for memcg caches in the same manner as for global caches. Apart from turning /sys/kernel/slab into a mess when there are a lot of kmem-active memcgs created, it actually does not work properly - we won't create more than one link to a memcg cache in case its parent is merged with another cache. For instance, if A is a root cache merged with another root cache B, we will have the following sysfs setup: X A -> X B -> X where X is some unique id (see create_unique_id()). Now if memcgs M and N start to allocate from cache A (or B, which is the same), we will get: X X:M X:N A -> X B -> X A:M -> X:M A:N -> X:N Since B is an alias for A, we won't get entries B:M and B:N, which is confusing. It is more logical to have entries for memcg caches under the corresponding root cache's sysfs directory. This would allow us to keep sysfs layout clean, and avoid such inconsistencies like one described above. This patch does the trick. It creates a "cgroup" kset in each root cache kobject to keep its children caches there. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'slab/next' of ↵Linus Torvalds2013-11-221-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB changes from Pekka Enberg: "The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for slab internals similar to mm/slub.c. This reduces memory usage and improves performance: https://lkml.org/lkml/2013/10/16/155 Rest of the changes are bug fixes from various people" * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits) mm, slub: fix the typo in mm/slub.c mm, slub: fix the typo in include/linux/slub_def.h slub: Handle NULL parameter in kmem_cache_flags slab: replace non-existing 'struct freelist *' with 'void *' slab: fix to calm down kmemleak warning slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled slab: rename slab_bufctl to slab_freelist slab: remove useless statement for checking pfmemalloc slab: use struct page for slab management slab: replace free and inuse in struct slab with newly introduced active slab: remove SLAB_LIMIT slab: remove kmem_bufctl_t slab: change the management method of free objects of the slab slab: use __GFP_COMP flag for allocating slab pages slab: use well-defined macro, virt_to_slab() slab: overloading the RCU head over the LRU for RCU free slab: remove cachep in struct slab_rcu slab: remove nodeid in struct slab slab: remove colouroff in struct slab slab: change return type of kmem_getpages() to struct page ...
| * mm, slub: fix the typo in include/linux/slub_def.hZhi Yong Wu2013-11-111-1/+1
| | | | | | | | | | | | Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | slub: remove verify_mem_not_deleted()Christoph Lameter2013-09-041-13/+0
| | | | | | | | | | | | | | I do not see any user for this code in the tree. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | mm/sl[aou]b: Move kmallocXXX functions to common codeChristoph Lameter2013-09-041-97/+0
|/ | | | | | | | | | | | | | | | | | The kmalloc* functions of all slab allcoators are similar now so lets move them into slab.h. This requires some function naming changes in slob. As a results of this patch there is a common set of functions for all allocators. Also means that kmalloc_large() is now available in general to perform large order allocations that go directly via the page allocator. kmalloc_large() can be substituted if kmalloc() throws warnings because of too large allocations. kmalloc_large() has exactly the same semantics as kmalloc but can only used for allocations > PAGE_SIZE. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slab: Common definition for kmem_cache_nodeChristoph Lameter2013-02-011-11/+0
| | | | | | | | | Put the definitions for the kmem_cache_node structures together so that we have one structure. That will allow us to create more common fields in the future which could yield more opportunities to share code. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slab: Common Kmalloc cache determinationChristoph Lameter2013-02-011-31/+10
| | | | | | | | | | | | | Extract the optimized lookup functions from slub and put them into slab_common.c. Then make slab use these functions as well. Joonsoo notes that this fixes some issues with constant folding which also reduces the code size for slub. https://lkml.org/lkml/2012/10/20/82 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slab: Common definition for the array of kmalloc cachesChristoph Lameter2013-02-011-6/+0
| | | | | | | | | Have a common definition fo the kmalloc cache arrays in SLAB and SLUB Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slab: Common constants for kmalloc boundariesChristoph Lameter2013-02-011-16/+3
| | | | | | | | | | | | Standardize the constants that describe the smallest and largest object kept in the kmalloc arrays for SLAB and SLUB. Differentiate between the maximum size for which a slab cache is used (KMALLOC_MAX_CACHE_SIZE) and the maximum allocatable size (KMALLOC_MAX_SIZE, KMALLOC_MAX_ORDER). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slab: Common kmalloc slab index determinationChristoph Lameter2013-02-011-63/+0
| | | | | | | | | | | | | Extract the function to determine the index of the slab within the array of kmalloc caches as well as a function to determine maximum object size from the nr of the kmalloc slab. This is used here only to simplify slub bootstrap but will be used later also for SLAB. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: slub-specific propagation changesGlauber Costa2012-12-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a ->show() in the root cache, and then ->store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. [akpm@linux-foundation.org: tweak code to avoid __maybe_unused] Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sl[au]b: allocate objects from memcg cacheGlauber Costa2012-12-181-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We are able to match a cache allocation to a particular memcg. If the task doesn't change groups during the allocation itself - a rare event, this will give us a good picture about who is the first group to touch a cache page. This patch uses the now available infrastructure by calling memcg_kmem_get_cache() before all the cache allocations. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* slab/slub: struct memcg_paramsGlauber Costa2012-12-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | For the kmem slab controller, we need to record some extra information in the kmem_cache structure. Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Suleiman Souhlal <suleiman@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, sl[aou]b: Extract common fields from struct kmem_cacheChristoph Lameter2012-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Define a struct that describes common fields used in all slab allocators. A slab allocator either uses the common definition (like SLOB) or is required to provide members of kmem_cache with the definition given. After that it will be possible to share code that only operates on those fields of kmem_cache. The patch basically takes the slob definition of kmem cache and uses the field namees for the other allocators. It also standardizes the names used for basic object lengths in allocators: object_size Struct size specified at kmem_cache_create. Basically the payload expected to be used by the subsystem. size The size of memory allocator for each object. This size is larger than object_size and includes padding, alignment and extra metadata for each object (f.e. for debugging and rcu). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: Get rid of the node fieldChristoph Lameter2012-06-011-1/+0
| | | | | | | | | | | The node field is always page_to_nid(c->page). So its rather easy to replace. Note that there maybe slightly more overhead in various hot paths due to the need to shift the bits from page->flags. However, that is mostly compensated for by a smaller footprint of the kmem_cache_cpu structure (this patch reduces that to 3 words per cache) which allows better caching. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* Merge branch 'slab/for-linus' of ↵Linus Torvalds2012-03-281-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB changes from Pekka Enberg: "There's the new kmalloc_array() API, minor fixes and performance improvements, but quite honestly, nothing terribly exciting." * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: mm: SLAB Out-of-memory diagnostics slab: introduce kmalloc_array() slub: per cpu partial statistics change slub: include include for prefetch slub: Do not hold slub_lock when calling sysfs_slab_add() slub: prefetch next freelist pointer in slab_alloc() slab, cleanup: remove unneeded return
| * slub: per cpu partial statistics changeAlex Shi2012-02-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch split the cpu_partial_free into 2 parts: cpu_partial_node, PCP refilling times from node partial; and same name cpu_partial_free, PCP refilling times in slab_free slow path. A new statistic 'cpu_partial_drain' is added to get PCP drain to node partial times. These info are useful when do PCP tunning. The slabinfo.c code is unchanged, since cpu_partial_node is not on slow path. Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker2012-03-041-0/+1
|/ | | | | | | | | | | | | If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* slub: correct comments error for per cpu partialAlex Shi2011-09-271-1/+1
| | | | | | | | | Correct comment errors, that mistake cpu partial objects number as pages number, may make reader misunderstand. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: per cpu cache for partial pagesChristoph Lameter2011-08-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to partial pages. The partial page list is used in slab_free() to avoid per node lock taking. In __slab_alloc() we can then take multiple partial pages off the per node partial list in one go reducing node lock pressure. We can also use the per cpu partial list in slab_alloc() to avoid scanning partial lists for pages with free objects. The main effect of a per cpu partial list is that the per node list_lock is taken for batches of partial pages instead of individual ones. Potential future enhancements: 1. The pickup from the partial list could be perhaps be done without disabling interrupts with some work. The free path already puts the page into the per cpu partial list without disabling interrupts. 2. __slab_free() may have some code paths that could use optimization. Performance: Before After ./hackbench 100 process 200000 Time: 1953.047 1564.614 ./hackbench 100 process 20000 Time: 207.176 156.940 ./hackbench 100 process 20000 Time: 204.468 156.940 ./hackbench 100 process 20000 Time: 204.879 158.772 ./hackbench 10 process 20000 Time: 20.153 15.853 ./hackbench 10 process 20000 Time: 20.153 15.986 ./hackbench 10 process 20000 Time: 19.363 16.111 ./hackbench 1 process 20000 Time: 2.518 2.307 ./hackbench 1 process 20000 Time: 2.258 2.339 ./hackbench 1 process 20000 Time: 2.864 2.163 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* Merge branch 'slub/lockless' of ↵Linus Torvalds2011-07-301-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slub/lockless' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: (21 commits) slub: When allocating a new slab also prep the first object slub: disable interrupts in cmpxchg_double_slab when falling back to pagelock Avoid duplicate _count variables in page_struct Revert "SLUB: Fix build breakage in linux/mm_types.h" SLUB: Fix build breakage in linux/mm_types.h slub: slabinfo update for cmpxchg handling slub: Not necessary to check for empty slab on load_freelist slub: fast release on full slab slub: Add statistics for the case that the current slab does not match the node slub: Get rid of the another_slab label slub: Avoid disabling interrupts in free slowpath slub: Disable interrupts in free_debug processing slub: Invert locking and avoid slab lock slub: Rework allocator fastpaths slub: Pass kmem_cache struct to lock and freeze slab slub: explicit list_lock taking slub: Add cmpxchg_double_slab() mm: Rearrange struct page slub: Move page->frozen handling near where the page->freelist handling occurs slub: Do not use frozen page flag but a bit in the page counters ...
| * slub: fast release on full slabChristoph Lameter2011-07-021-0/+1
| | | | | | | | | | | | | | | | | | Make deactivation occur implicitly while checking out the current freelist. This avoids one cmpxchg operation on a slab that is now fully in use. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slub: Add statistics for the case that the current slab does not match the nodeChristoph Lameter2011-07-021-0/+1
| | | | | | | | | | | | | | | | Slub reloads the per cpu slab if the page does not satisfy the NUMA condition. Track those reloads since doing so has a performance impact. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slub: Add cmpxchg_double_slab()Christoph Lameter2011-07-021-0/+1
| | | | | | | | | | | | | | | | Add a function that operates on the second doubleword in the page struct and manipulates the object counters, the freelist and the frozen attribute. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | slub: Add method to verify memory is not freedBen Greear2011-07-071-0/+13
| | | | | | | | | | | | | | | | This is for tracking down suspect memory usage. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | slab, slub, slob: Unify alignment definitionChristoph Lameter2011-06-161-10/+0
|/ | | | | | | | | | | | | | Every slab has its on alignment definition in include/linux/sl?b_def.h. Extract those and define a common set in include/linux/slab.h. SLOB: As notes sometimes we need double word alignment on 32 bit. This gives all structures allocated by SLOB a unsigned long long alignment like the others do. SLAB: If ARCH_SLAB_MINALIGN is not set SLAB would set ARCH_SLAB_MINALIGN to zero meaning no alignment at all. Give it the default unsigned long long alignment. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: Deal with hyperthetical case of PAGE_SIZE > 2MChristoph Lameter2011-05-211-2/+4
| | | | | | | | | | | kmalloc_index() currently returns -1 if the PAGE_SIZE is larger than 2M which seems to cause some concern since the callers do not check for -1. Insert a BUG() and add a comment to the -1 explaining that the code cannot be reached. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: Remove CONFIG_CMPXCHG_LOCAL ifdefferyChristoph Lameter2011-05-071-2/+0
| | | | | | | | | | | | | | | | | | Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used everywhere. There may be performance implications since: A. We now have to manage a transaction ID for all arches B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced to a very short irqoff section. There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback case we only have to do one disable and enable like before. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: Add statistics for this_cmpxchg_double failuresChristoph Lameter2011-03-221-0/+1
| | | | | | | Add some statistics for debugging. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* Merge branch 'slub/lockless' into for-linusPekka Enberg2011-03-201-2/+5
|\ | | | | | | | | Conflicts: include/linux/slub_def.h
| * Lockless (and preemptless) fastpaths for slubChristoph Lameter2011-03-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the this_cpu_cmpxchg_double functionality to implement a lockless allocation algorithm on arches that support fast this_cpu_ops. Each of the per cpu pointers is paired with a transaction id that ensures that updates of the per cpu information can only occur in sequence on a certain cpu. A transaction id is a "long" integer that is comprised of an event number and the cpu number. The event number is incremented for every change to the per cpu state. This means that the cmpxchg instruction can verify for an update that nothing interfered and that we are updating the percpu structure for the processor where we picked up the information and that we are also currently on that processor when we update the information. This results in a significant decrease of the overhead in the fastpaths. It also makes it easy to adopt the fast path for realtime kernels since this is lockless and does not require the use of the current per cpu area over the critical section. It is only important that the per cpu area is current at the beginning of the critical section and at the end. So there is no need even to disable preemption. Test results show that the fastpath cycle count is reduced by up to ~ 40% (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree adds a few cycles. Sadly this does nothing for the slowpath which is where the main issues with performance in slub are but the best case performance rises significantly. (For that see the more complex slub patches that require cmpxchg_double) Kmalloc: alloc/free test Before: 10000 times kmalloc(8)/kfree -> 134 cycles 10000 times kmalloc(16)/kfree -> 152 cycles 10000 times kmalloc(32)/kfree -> 144 cycles 10000 times kmalloc(64)/kfree -> 142 cycles 10000 times kmalloc(128)/kfree -> 142 cycles 10000 times kmalloc(256)/kfree -> 132 cycles 10000 times kmalloc(512)/kfree -> 132 cycles 10000 times kmalloc(1024)/kfree -> 135 cycles 10000 times kmalloc(2048)/kfree -> 135 cycles 10000 times kmalloc(4096)/kfree -> 135 cycles 10000 times kmalloc(8192)/kfree -> 144 cycles 10000 times kmalloc(16384)/kfree -> 754 cycles After: 10000 times kmalloc(8)/kfree -> 78 cycles 10000 times kmalloc(16)/kfree -> 78 cycles 10000 times kmalloc(32)/kfree -> 82 cycles 10000 times kmalloc(64)/kfree -> 88 cycles 10000 times kmalloc(128)/kfree -> 79 cycles 10000 times kmalloc(256)/kfree -> 79 cycles 10000 times kmalloc(512)/kfree -> 85 cycles 10000 times kmalloc(1024)/kfree -> 82 cycles 10000 times kmalloc(2048)/kfree -> 82 cycles 10000 times kmalloc(4096)/kfree -> 85 cycles 10000 times kmalloc(8192)/kfree -> 82 cycles 10000 times kmalloc(16384)/kfree -> 706 cycles Kmalloc: Repeatedly allocate then free test Before: 10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles 10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles 10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles 10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles 10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles 10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles 10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles 10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles 10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles 10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles 10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles 10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles After: 10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles 10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles 10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles 10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles 10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles 10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles 10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles 10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles 10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles 10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles 10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles 10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slub: min_partial needs to be in first cachelineChristoph Lameter2011-03-111-1/+1
| | | | | | | | | | | | | | | | It is used in unfreeze_slab() which is a performance critical function. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | slub: automatically reserve bytes at the end of slabLai Jiangshan2011-03-111-0/+1
|/ | | | | | | | | | | | | | | | | There is no "struct" for slub's slab, it shares with struct page. But struct page is very small, it is insufficient when we need to add some metadata for slab. So we add a field "reserved" to struct kmem_cache, when a slab is allocated, kmem_cache->reserved bytes are automatically reserved at the end of the slab for slab's metadata. Changed from v1: Export the reserved field via sysfs Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub tracing: move trace calls out of always inlined functions to reduce ↵Richard Kennedy2010-11-061-29/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel code size Having the trace calls defined in the always inlined kmalloc functions in include/linux/slub_def.h causes a lot of code duplication as the trace functions get instantiated for each kamalloc call site. This can simply be removed by pushing the trace calls down into the functions in slub.c. On my x86_64 built this patch shrinks the code size of the kernel by approx 36K and also shrinks the code size of many modules -- too many to list here ;) size vmlinux (2.6.36) reports text data bss dec hex filename 5410611 743172 828928 6982711 6a8c37 vmlinux 5373738 744244 828928 6946910 6a005e vmlinux + patch The resulting kernel has had some testing & kmalloc trace still seems to work. This patch - moves trace_kmalloc out of the inlined kmalloc() and pushes it down into kmem_cache_alloc_trace() so this it only get instantiated once. - rename kmem_cache_alloc_notrace() to kmem_cache_alloc_trace() to indicate that now is does have tracing. (maybe this would better being called something like kmalloc_kmem_cache ?) - adds a new function kmalloc_order() to handle allocation and tracing of large allocations of page order. - removes tracing from the inlined kmalloc_large() replacing them with a call to kmalloc_order(); - move tracing out of inlined kmalloc_node() and pushing it down into kmem_cache_alloc_node_trace - rename kmem_cache_alloc_node_notrace() to kmem_cache_alloc_node_trace() - removes the include of trace/events/kmem.h from slub_def.h. v2 - keep kmalloc_order_trace inline when !CONFIG_TRACE Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: Enable sysfs support for !CONFIG_SLUB_DEBUGChristoph Lameter2010-10-061-1/+1
| | | | | | | | | | Currently disabling CONFIG_SLUB_DEBUG also disabled SYSFS support meaning that the slabs cannot be tuned without DEBUG. Make SYSFS support independent of CONFIG_SLUB_DEBUG Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: reduce differences between SMP and NUMAChristoph Lameter2010-10-021-4/+1
| | | | | | | | | | Reduce the #ifdefs and simplify bootstrap by making SMP and NUMA as much alike as possible. This means that there will be an additional indirection to get to the kmem_cache_node field under SMP. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* slub: Dynamically size kmalloc cache allocationsChristoph Lameter2010-10-021-5/+2
| | | | | | | | | | | | | | | | | | | kmalloc caches are statically defined and may take up a lot of space just because the sizes of the node array has to be dimensioned for the largest node count supported. This patch makes the size of the kmem_cache structure dynamic throughout by creating a kmem_cache slab cache for the kmem_cache objects. The bootstrap occurs by allocating the initial one or two kmem_cache objects from the page allocator. C2->C3 - Fix various issues indicated by David - Make create kmalloc_cache return a kmem_cache * pointer. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-08-221-1/+1
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slab: fix object alignment slub: add missing __percpu markup in mm/slub_def.h
| * slub: add missing __percpu markup in mm/slub_def.hNamhyung Kim2010-08-091-1/+1
| | | | | | | | | | | | | | | | | | kmem_cache->cpu_slab is a percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | dma-mapping: rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGNFUJITA Tomonori2010-08-111-3/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | Now each architecture has the own dma_get_cache_alignment implementation. dma_get_cache_alignment returns the minimum DMA alignment. Architectures define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed buffer is DMA-safe; the buffer doesn't share a cache with the others). So we can unify dma_get_cache_alignment implementations. This patch: dma_get_cache_alignment() needs to know if an architecture defines ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA alignment restriction). However, slab.h define ARCH_KMALLOC_MINALIGN if architectures doesn't define it. Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN. ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub (except for crypto). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'perf/core' of ↵Ingo Molnar2010-06-091-1/+2
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
| * tracing: Remove kmemtrace ftrace pluginLi Zefan2010-06-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been resisting new ftrace plugins and removing existing ones, and kmemtrace has been superseded by kmem trace events and perf-kmem, so we remove it. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> [ remove kmemtrace from the makefile, handle slob too ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* | SLUB: Allow full duplication of kmalloc array for 390Christoph Lameter2010-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit 756dee75872a2a764b478e18076360b8a4ec9045 ("SLUB: Get rid of dynamic DMA kmalloc cache allocation") makes S390 run out of kmalloc caches. Increase the number of kmalloc caches to a safe size. Cc: <stable@kernel.org> [ .33 and .34 ] Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
* | slub: move kmem_cache_node into it's own cachelineAlexander Duyck2010-05-241-6/+3
|/ | | | | | | | | | | | | | | | | | | | This patch is meant to improve the performance of SLUB by moving the local kmem_cache_node lock into it's own cacheline separate from kmem_cache. This is accomplished by simply removing the local_node when NUMA is enabled. On my system with 2 nodes I saw around a 5% performance increase w/ hackbench times dropping from 6.2 seconds to 5.9 seconds on average. I suspect the performance gain would increase as the number of nodes increases, but I do not have the data to currently back that up. Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15713 Cc: <stable@kernel.org> Reported-by: Alex Shi <alex.shi@intel.com> Tested-by: Alex Shi <alex.shi@intel.com> Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
* mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slub_def.h>David Woodhouse2010-05-191-0/+8
| | | | | | Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
* SLUB: this_cpu: Remove slub kmem_cache fieldsChristoph Lameter2009-12-201-2/+0
| | | | | | | | | | | | | | | | | | | Remove the fields in struct kmem_cache_cpu that were used to cache data from struct kmem_cache when they were in different cachelines. The cacheline that holds the per cpu array pointer now also holds these values. We can cut down the struct kmem_cache_cpu size to almost half. The get_freepointer() and set_freepointer() functions that used to be only intended for the slow path now are also useful for the hot path since access to the size field does not require accessing an additional cacheline anymore. This results in consistent use of functions for setting the freepointer of objects throughout SLUB. Also we initialize all possible kmem_cache_cpu structures when a slab is created. No need to initialize them when a processor or node comes online. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
OpenPOWER on IntegriCloud