summaryrefslogtreecommitdiffstats
path: root/fs/file.c
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] fdtable: Implement new pagesize-based fdtable allocatorVadim Lobanov2006-12-101-136/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fdtable: Remove the free_files fieldVadim Lobanov2006-12-101-19/+8
| | | | | | | | | | | | | | | | | | | An fdtable can either be embedded inside a files_struct or standalone (after being expanded). When an fdtable is being discarded after all RCU references to it have expired, we must either free it directly, in the standalone case, or free the files_struct it is contained within, in the embedded case. Currently the free_files field controls this behavior, but we can get rid of it entirely, as all the necessary information is already recorded. We can distinguish embedded and standalone fdtables using max_fds, and if it is embedded we can divine the relevant files_struct using container_of(). Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fdtable: Make fdarray and fdsets equal in sizeVadim Lobanov2006-12-101-34/+22
| | | | | | | | | | | | | | | | | | | | | | | | Currently, each fdtable supports three dynamically-sized arrays of data: the fdarray and two fdsets. The code allows the number of fds supported by the fdarray (fdtable->max_fds) to differ from the number of fds supported by each of the fdsets (fdtable->max_fdset). In practice, it is wasteful for these two sizes to differ: whenever we hit a limit on the smaller-capacity structure, we will reallocate the entire fdtable and all the dynamic arrays within it, so any delta in the memory used by the larger-capacity structure will never be touched at all. Rather than hogging this excess, we shouldn't even allocate it in the first place, and keep the capacities of the fdarray and the fdsets equal. This patch removes fdtable->max_fdset. As an added bonus, most of the supporting code becomes simpler. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] file: kill unnecessary timer in fdtable_deferTejun Heo2006-12-071-27/+2
| | | | | | | | | | | | | | | | | | free_fdtable_rc() schedules timer to reschedule fddef->wq if schedule_work() on it returns 0. However, schedule_work() guarantees that the target work is executed at least once after the scheduling regardless of its return value. 0 return simply means that the work was already pending and thus no further action was required. Another problem is that it used contant '5' as @expires argument to mod_timer(). Kill unnecessary fddef->timer. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* WorkStruct: Pass the work_struct pointer instead of context dataDavid Howells2006-11-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>
* [PATCH] expand_fdtable(): remove pointless unlock+lockAndrew Morton2006-09-291-2/+0
| | | | | | | | This unlock/lock on a super-unlikely path isn't worth the kernel text. Cc: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Clean up expand_fdtable() and expand_files()Vadim Lobanov2006-09-291-41/+35
| | | | | | | | | | | Perform a code cleanup against the expand_fdtable() and expand_files() functions inside fs/file.c. It aims to make the flow of code within these functions simpler and easier to understand, via added comments and modest refactoring. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] alloc_fdtable() cleanupAndrew Morton2006-09-271-4/+2
| | | | | | | free_fdset(NULL, ...) is legal. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] alloc_fdtable() expansion fixAndrew Morton2006-07-121-1/+1
| | | | | | | | | | | We're supposed to go the next power of two if nfds==nr. Of `nr', not of `nfsd'. Spotted by Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fix fdset leakageKirill Korotaev2006-07-121-1/+3
| | | | | | | | | | | | | | When found, it is obvious. nfds calculated when allocating fdsets is rewritten by calculation of size of fdtable, and when we are unlucky, we try to free fdsets of wrong size. Found due to OpenVZ resource management (User Beancounters). Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fix weird logic in alloc_fdtable()Andrew Morton2006-07-101-7/+3
| | | | | | | | | | There's a fairly obvious infinite loop in there. Also, use roundup_pow_of_two() rather than open-coding stuff. Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] for_each_possible_cpu: fixes for generic partKAMEZAWA Hiroyuki2006-03-281-1/+1
| | | | | | | | replaces for_each_cpu with for_each_possible_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Shrinks sizeof(files_struct) and better layoutEric Dumazet2006-03-231-20/+14
| | | | | | | | | | | | | | | | | | | | | | | | 1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits platforms, lowering kmalloc() allocated space by 50%. 2) Reduce the size of (files_struct), using a special 32 bits (or 64bits) embedded_fd_set, instead of a 1024 bits fd_set for the close_on_exec_init and open_fds_init fields. This save some ram (248 bytes per task) as most tasks dont open more than 32 files. D-Cache footprint for such tasks is also reduced to the minimum. 3) Reduce size of allocated fdset. Currently two full pages are allocated, that is 32768 bits on x86 for example, and way too much. The minimum is now L1_CACHE_BYTES. UP and SMP should benefit from this patch, because most tasks will touch only one cache line when open()/close() stdin/stdout/stderr (0/1/2), (next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the same cache line) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] percpu data: only iterate over possible CPUsEric Dumazet2006-02-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | percpu_data blindly allocates bootmem memory to store NR_CPUS instances of cpudata, instead of allocating memory only for possible cpus. As a preparation for changing that, we need to convert various 0 -> NR_CPUS loops to use for_each_cpu(). (The above only applies to users of asm-generic/percpu.h. powerpc has gone it alone and is presently only allocating memory for present CPUs, so it's currently corrupting memory). Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@steeleye.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <axboe@suse.de> Cc: Anton Blanchard <anton@samba.org> Acked-by: William Irwin <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix the fdtable freeing in the case of vmalloced fdset/arraysDipankar Sarma2005-09-141-7/+3
| | | | | | | | | | | | | | | Noted by David Miller: "The bug is that free_fd_array() takes a "num" argument, but when calling it from __free_fdtable() we're instead passing in the size in bytes (ie. "num * sizeof(struct file *)")." Yes it is a bug. I think I messed it up while merging newer changes with an older version where I was using size in bytes to optimize. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] files: files struct with RCUDipankar Sarma2005-09-091-131/+258
| | | | | | | | | | | | | | | | | | | | | Patch to eliminate struct files_struct.file_lock spinlock on the reader side and use rcu refcounting rcuref_xxx api for the f_count refcounter. The updates to the fdtable are done by allocating a new fdtable structure and setting files->fdt to point to the new structure. The fdtable structure is protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A global list of fdtable to be freed is not scalable, so we use a per-cpu list. If keventd is already handling the current cpu's work, we use a timer to defer queueing of that work. Since the last publication, this patch has been re-written to avoid using explicit memory barriers and use rcu_assign_pointer(), rcu_dereference() premitives instead. This required that the fd information is kept in a separate structure (fdtable) and updated atomically. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] files: break up files structDipankar Sarma2005-09-091-17/+25
| | | | | | | | | | | | | | | In order for the RCU to work, the file table array, sets and their sizes must be updated atomically. Instead of ensuring this through too many memory barriers, we put the arrays and their sizes in a separate structure. This patch takes the first step of putting the file table elements in a separate structure fdtable that is embedded withing files_struct. It also changes all the users to refer to the file table using files_fdtable() macro. Subsequent applciation of RCU becomes easier after this. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-161-0/+254
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
OpenPOWER on IntegriCloud