summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* UBIFS: improve statfs reportingArtem Bityutskiy2008-08-313-32/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make free space calculation less pessimistic and more realistic, which in turn improves 'statfs()' reports. Now it lies by 10%-20%, instead of 20%-30% (10% more honest). Results of "freespace" test (120MiB volume, 16KiB LEB size, 512 bytes page size). Before the change: freespace: Test 1: fill the space we have 3 times freespace: was free: 78274560 bytes 74.6 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 18214912 bytes 17.4 MiB, wrote 23.3% more than predicted freespace: was free: 76754944 bytes 73.2 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 19738624 bytes 18.8 MiB, wrote 25.7% more than predicted freespace: was free: 76759040 bytes 73.2 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 19730432 bytes 18.8 MiB, wrote 25.7% more than predicted freespace: Test 1 finished freespace: Test 2: gradually lessen amount of free space and fill the FS freespace: do 10 steps, lessen free space by 6977722 bytes 6.7 MiB each time freespace: was free: 72273920 bytes 68.9 MiB, wrote: 88891392 bytes 84.8 MiB, delta: 16617472 bytes 15.8 MiB, wrote 23.0% more than predicted freespace: was free: 66154496 bytes 63.1 MiB, wrote: 81506304 bytes 77.7 MiB, delta: 15351808 bytes 14.6 MiB, wrote 23.2% more than predicted freespace: was free: 58732544 bytes 56.0 MiB, wrote: 72572928 bytes 69.2 MiB, delta: 13840384 bytes 13.2 MiB, wrote 23.6% more than predicted freespace: was free: 51552256 bytes 49.2 MiB, wrote: 63754240 bytes 60.8 MiB, delta: 12201984 bytes 11.6 MiB, wrote 23.7% more than predicted freespace: was free: 44404736 bytes 42.3 MiB, wrote: 54943744 bytes 52.4 MiB, delta: 10539008 bytes 10.1 MiB, wrote 23.7% more than predicted freespace: was free: 37285888 bytes 35.6 MiB, wrote: 46161920 bytes 44.0 MiB, delta: 8876032 bytes 8.5 MiB, wrote 23.8% more than predicted freespace: was free: 30171136 bytes 28.8 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 7213056 bytes 6.9 MiB, wrote 23.9% more than predicted freespace: was free: 23048192 bytes 22.0 MiB, wrote: 28606464 bytes 27.3 MiB, delta: 5558272 bytes 5.3 MiB, wrote 24.1% more than predicted freespace: was free: 15941632 bytes 15.2 MiB, wrote: 19828736 bytes 18.9 MiB, delta: 3887104 bytes 3.7 MiB, wrote 24.4% more than predicted freespace: was free: 8830976 bytes 8.4 MiB, wrote: 11063296 bytes 10.6 MiB, delta: 2232320 bytes 2.1 MiB, wrote 25.3% more than predicted freespace: Test 2 finished freespace: Test 3: gradually lessen amount of free space by trashing and fill the FS freespace: do 10 steps, lessen free space by 6985541 bytes 6.7 MiB each time freespace: trashing: was free: 76840960 bytes 73.3 MiB, need free: 6985550 bytes 6.7 MiB, files created: 248311, delete 225737 (90.9% of them) freespace: was free: 65228800 bytes 62.2 MiB, wrote: 82530304 bytes 78.7 MiB, delta: 17301504 bytes 16.5 MiB, wrote 26.5% more than predicted freespace: trashing: was free: 74485760 bytes 71.0 MiB, need free: 13971091 bytes 13.3 MiB, files created: 248712, delete 202061 (81.2% of them) freespace: was free: 55025664 bytes 52.5 MiB, wrote: 71925760 bytes 68.6 MiB, delta: 16900096 bytes 16.1 MiB, wrote 30.7% more than predicted freespace: trashing: was free: 75550720 bytes 72.1 MiB, need free: 20956632 bytes 20.0 MiB, files created: 248849, delete 179822 (72.3% of them) freespace: was free: 46669824 bytes 44.5 MiB, wrote: 63197184 bytes 60.3 MiB, delta: 16527360 bytes 15.8 MiB, wrote 35.4% more than predicted freespace: trashing: was free: 76214272 bytes 72.7 MiB, need free: 27942173 bytes 26.6 MiB, files created: 248789, delete 157576 (63.3% of them) freespace: was free: 39129088 bytes 37.3 MiB, wrote: 55164928 bytes 52.6 MiB, delta: 16035840 bytes 15.3 MiB, wrote 41.0% more than predicted freespace: trashing: was free: 77398016 bytes 73.8 MiB, need free: 34927714 bytes 33.3 MiB, files created: 248711, delete 136474 (54.9% of them) freespace: was free: 32325632 bytes 30.8 MiB, wrote: 48234496 bytes 46.0 MiB, delta: 15908864 bytes 15.2 MiB, wrote 49.2% more than predicted freespace: trashing: was free: 75796480 bytes 72.3 MiB, need free: 41913255 bytes 40.0 MiB, files created: 248674, delete 111164 (44.7% of them) freespace: was free: 25079808 bytes 23.9 MiB, wrote: 40775680 bytes 38.9 MiB, delta: 15695872 bytes 15.0 MiB, wrote 62.6% more than predicted freespace: trashing: was free: 78209024 bytes 74.6 MiB, need free: 48898796 bytes 46.6 MiB, files created: 248708, delete 93207 (37.5% of them) freespace: was free: 20582400 bytes 19.6 MiB, wrote: 34844672 bytes 33.2 MiB, delta: 14262272 bytes 13.6 MiB, wrote 69.3% more than predicted freespace: trashing: was free: 77328384 bytes 73.7 MiB, need free: 55884337 bytes 53.3 MiB, files created: 248644, delete 68951 (27.7% of them) freespace: was free: 14368768 bytes 13.7 MiB, wrote: 28278784 bytes 27.0 MiB, delta: 13910016 bytes 13.3 MiB, wrote 96.8% more than predicted freespace: trashing: was free: 77434880 bytes 73.8 MiB, need free: 62869878 bytes 60.0 MiB, files created: 248640, delete 46767 (18.8% of them) freespace: was free: 8286208 bytes 7.9 MiB, wrote: 21811200 bytes 20.8 MiB, delta: 13524992 bytes 12.9 MiB, wrote 163.2% more than predicted freespace: trashing: was free: 77856768 bytes 74.2 MiB, need free: 69855419 bytes 66.6 MiB, files created: 248576, delete 25546 (10.3% of them) freespace: was free: 5570560 bytes 5.3 MiB, wrote: 8187904 bytes 7.8 MiB, delta: 2617344 bytes 2.5 MiB, wrote 47.0% more than predicted freespace: Test 3 finished freespace: finished successfully After the change: freespace: Test 1: fill the space we have 3 times freespace: was free: 85204992 bytes 81.3 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 11284480 bytes 10.8 MiB, wrote 13.2% more than predicted freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 12935168 bytes 12.3 MiB, wrote 15.5% more than predicted freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 12939264 bytes 12.3 MiB, wrote 15.5% more than predicted freespace: Test 1 finished freespace: Test 2: gradually lessen amount of free space and fill the FS freespace: do 10 steps, lessen free space by 7596218 bytes 7.2 MiB each time freespace: was free: 78675968 bytes 75.0 MiB, wrote: 88903680 bytes 84.8 MiB, delta: 10227712 bytes 9.8 MiB, wrote 13.0% more than predicted freespace: was free: 72015872 bytes 68.7 MiB, wrote: 81514496 bytes 77.7 MiB, delta: 9498624 bytes 9.1 MiB, wrote 13.2% more than predicted freespace: was free: 63938560 bytes 61.0 MiB, wrote: 72589312 bytes 69.2 MiB, delta: 8650752 bytes 8.2 MiB, wrote 13.5% more than predicted freespace: was free: 56127488 bytes 53.5 MiB, wrote: 63762432 bytes 60.8 MiB, delta: 7634944 bytes 7.3 MiB, wrote 13.6% more than predicted freespace: was free: 48336896 bytes 46.1 MiB, wrote: 54935552 bytes 52.4 MiB, delta: 6598656 bytes 6.3 MiB, wrote 13.7% more than predicted freespace: was free: 40587264 bytes 38.7 MiB, wrote: 46157824 bytes 44.0 MiB, delta: 5570560 bytes 5.3 MiB, wrote 13.7% more than predicted freespace: was free: 32841728 bytes 31.3 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 4542464 bytes 4.3 MiB, wrote 13.8% more than predicted freespace: was free: 25100288 bytes 23.9 MiB, wrote: 28618752 bytes 27.3 MiB, delta: 3518464 bytes 3.4 MiB, wrote 14.0% more than predicted freespace: was free: 17342464 bytes 16.5 MiB, wrote: 19841024 bytes 18.9 MiB, delta: 2498560 bytes 2.4 MiB, wrote 14.4% more than predicted freespace: was free: 9605120 bytes 9.2 MiB, wrote: 11063296 bytes 10.6 MiB, delta: 1458176 bytes 1.4 MiB, wrote 15.2% more than predicted freespace: Test 2 finished freespace: Test 3: gradually lessen amount of free space by trashing and fill the FS freespace: do 10 steps, lessen free space by 7606272 bytes 7.3 MiB each time freespace: trashing: was free: 83668992 bytes 79.8 MiB, need free: 7606272 bytes 7.3 MiB, files created: 248297, delete 225724 (90.9% of them) freespace: was free: 70803456 bytes 67.5 MiB, wrote: 82485248 bytes 78.7 MiB, delta: 11681792 bytes 11.1 MiB, wrote 16.5% more than predicted freespace: trashing: was free: 81080320 bytes 77.3 MiB, need free: 15212544 bytes 14.5 MiB, files created: 248711, delete 202047 (81.2% of them) freespace: was free: 59867136 bytes 57.1 MiB, wrote: 71897088 bytes 68.6 MiB, delta: 12029952 bytes 11.5 MiB, wrote 20.1% more than predicted freespace: trashing: was free: 82243584 bytes 78.4 MiB, need free: 22818816 bytes 21.8 MiB, files created: 248866, delete 179817 (72.3% of them) freespace: was free: 50905088 bytes 48.5 MiB, wrote: 63168512 bytes 60.2 MiB, delta: 12263424 bytes 11.7 MiB, wrote 24.1% more than predicted freespace: trashing: was free: 83402752 bytes 79.5 MiB, need free: 30425088 bytes 29.0 MiB, files created: 248920, delete 158114 (63.5% of them) freespace: was free: 42651648 bytes 40.7 MiB, wrote: 55406592 bytes 52.8 MiB, delta: 12754944 bytes 12.2 MiB, wrote 29.9% more than predicted freespace: trashing: was free: 84402176 bytes 80.5 MiB, need free: 38031360 bytes 36.3 MiB, files created: 248709, delete 136641 (54.9% of them) freespace: was free: 35233792 bytes 33.6 MiB, wrote: 48250880 bytes 46.0 MiB, delta: 13017088 bytes 12.4 MiB, wrote 36.9% more than predicted freespace: trashing: was free: 82530304 bytes 78.7 MiB, need free: 45637632 bytes 43.5 MiB, files created: 248778, delete 111208 (44.7% of them) freespace: was free: 27287552 bytes 26.0 MiB, wrote: 40267776 bytes 38.4 MiB, delta: 12980224 bytes 12.4 MiB, wrote 47.6% more than predicted freespace: trashing: was free: 85114880 bytes 81.2 MiB, need free: 53243904 bytes 50.8 MiB, files created: 248508, delete 93052 (37.4% of them) freespace: was free: 22437888 bytes 21.4 MiB, wrote: 35328000 bytes 33.7 MiB, delta: 12890112 bytes 12.3 MiB, wrote 57.4% more than predicted freespace: trashing: was free: 84103168 bytes 80.2 MiB, need free: 60850176 bytes 58.0 MiB, files created: 248637, delete 68743 (27.6% of them) freespace: was free: 15536128 bytes 14.8 MiB, wrote: 28319744 bytes 27.0 MiB, delta: 12783616 bytes 12.2 MiB, wrote 82.3% more than predicted freespace: trashing: was free: 84357120 bytes 80.4 MiB, need free: 68456448 bytes 65.3 MiB, files created: 248567, delete 46852 (18.8% of them) freespace: was free: 9015296 bytes 8.6 MiB, wrote: 22044672 bytes 21.0 MiB, delta: 13029376 bytes 12.4 MiB, wrote 144.5% more than predicted freespace: trashing: was free: 84942848 bytes 81.0 MiB, need free: 76062720 bytes 72.5 MiB, files created: 248636, delete 25993 (10.5% of them) freespace: was free: 6086656 bytes 5.8 MiB, wrote: 8331264 bytes 7.9 MiB, delta: 2244608 bytes 2.1 MiB, wrote 36.9% more than predicted freespace: Test 3 finished freespace: finished successfully Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* UBIFS: remove incorrect index space checkArtem Bityutskiy2008-08-311-14/+1
| | | | | | | | | | When we report free space to user-space, we should not report 0 if the amount of empty LEBs is too low, because they would be produced by GC when needed. Thus, just call 'ubifs_calc_available()' straight away which would take 'min_idx_lebs' into account anyway. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* UBIFS: push empty flash hack downArtem Bityutskiy2008-08-312-15/+11
| | | | | | | | | | | We have a hack which forces the amount of flash space to be equivalent to 'c->blocks_cnt' in case of empty FS. This is to make users happy and see '%0' used in 'df' when they mount an empty FS. This hack is not needed in 'ubifs_calc_available()', but it is only needed the caller, in 'ubifs_budg_get_free_space()'. So push it down there. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* UBIFS: do not update min_idx_lebs in stafsArtem Bityutskiy2008-08-311-1/+0
| | | | | | | This is bad because the rest of the code should not depend on it, and this may hide bugss, instead of revealing them. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* UBIFS: allow for racing between GC and TNCAdrian Hunter2008-08-254-51/+87
| | | | | | | | | | | | The TNC mutex is unlocked prematurely when reading leaf nodes with non-hashed keys. This is unsafe because the node may be moved by garbage collection and the eraseblock unmapped, although that has never actually happened during stress testing. This patch fixes the flaw by detecting the race and retrying with the TNC mutex locked. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
* UBIFS: always read hashed-key nodes under TNC mutexAdrian Hunter2008-08-251-6/+1
| | | | | | | | | | Leaf-nodes that have a hashed key are stored in the leaf-node-cache (LNC) which is protected by the TNC mutex. Consequently, when reading a leaf node with a hashed key (i.e. directory entries, xattr entries) the TNC mutex is always required. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
* UBIFS: fix zero-length truncationsArtem Bityutskiy2008-08-212-5/+16
| | | | | | | | | | | Always allow truncations to zero, even if budgeting thinks there is no space. UBIFS reserves some space for deletions anyway. Otherwise, the following happans: 1. create a file, and write as much as possible there, until ENOSPC 2. truncate the file, which fails with ENOSPC, which is not good. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* cramfs: fix named-pipe handlingAl Viro2008-08-201-46/+38
| | | | | | | | | | | | | | | | | | | After commit a97c9bf33f4612e2aed6f000f6b1d268b6814f3c (fix cramfs making duplicate entries in inode cache) in kernel 2.6.14, named-pipe on cramfs does not work properly. It seems the commit make all named-pipe on cramfs share their inode (and named-pipe buffer). Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino == 1 immediately. Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.14 and later] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix setpriority(PRIO_PGRP) thread iterator breakageKen Chen2008-08-201-4/+4
| | | | | | | | | | | | | | | | | | | | | When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP process, only the task leader of the process is affected, all other sibling LWP threads didn't receive the setting. The problem was that the iterator used in sys_setpriority() only iteartes over one task for each process, ignoring all other sibling thread. Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk each thread of a process. Convert 4 call sites in {set/get}priority and ioprio_{set/get}. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* binfmt_misc: fix false -ENOEXEC when coupled with other binary handlersPavel Emelyanov2008-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | In case the binfmt_misc binary handler is registered *before* the e.g. script one (when for example being compiled as a module) the following situation may occur: 1. user launches a script, whose interpreter is a misc binary; 2. the load_misc_binary sets the misc_bang and returns -ENOEVEC, since the binary is a script; 3. the load_script_binary loads one and calls for search_binary_hander to run the interpreter; 4. the load_misc_binary is called again, but refuses to load the binary due to misc_bang bit set. The fix is to move the misc_bang setting lower - prior to the actual call to the search_binary_handler. Caused by the commit 3a2e7f47 (binfmt_misc.c: avoid potential kernel stack overflow) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Tested-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* /proc/self/maps doesn't display the real file offsetClement Calmels2008-08-202-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses http://bugzilla.kernel.org/show_bug.cgi?id=11318 In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20 than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE equal to 4096 (i.e. 2^12). The next seq_printf use an unsigned long for the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset value displayed in /proc/self/maps is truncated if the page offset is greater than 2^20. A test that shows this issue: #define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <stdlib.h> #include <stdio.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #define PAGE_SIZE (getpagesize()) #if __i386__ # define U64_STR "%llx" #elif __x86_64 # define U64_STR "%lx" #else # error "Architecture Unsupported" #endif int main(int argc, char *argv[]) { int fd; char *addr; off64_t offset = 0x10000000; char *filename = "/dev/zero"; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return 1; } offset *= 0x10; printf("offset = " U64_STR "\n", offset); addr = (char*)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd, offset); if ((void*)addr == MAP_FAILED) { perror("mmap64"); return 1; } { FILE *fmaps; char *line = NULL; size_t len = 0; ssize_t read; size_t filename_len = strlen(filename); fmaps = fopen("/proc/self/maps", "r"); if (!fmaps) { perror("fopen"); return 1; } while ((read = getline(&line, &len, fmaps)) != -1) { if ((read > filename_len + 1) && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0)) printf("%s", line); } if (line) free(line); fclose(fmaps); } close(fd); return 0; } [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Clement Calmels <cboulte@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'sh/for-2.6.27' of ↵Linus Torvalds2008-08-201-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Provide a FLAT_PLAT_INIT() definition. binfmt_flat: Stub in a FLAT_PLAT_INIT(). video: export sh_mobile_lcdc panel size sh: select memchunk size using kernel cmdline sh: export sh7723 VEU as VEU2H input: migor_ts compile and detection fix sh: remove MSTPCR defines from Migo-R header file sh: Update sh7763rdp defconfig sh: Add support sh7760fb to sh7763rdp board sh: Add support sh_eth to sh7763rdp board sh: Disable 64kB hugetlbpage size when using 64kB PAGE_SIZE. sh: Don't export __{s,u}divsi3_i4i from SH-2 libgcc. fix SH7705_CACHE_32KB compilation sh: mach-x3proto: Fix up smc91x platform data.
| * binfmt_flat: Stub in a FLAT_PLAT_INIT().Takashi YOSHII2008-08-111-1/+3
| | | | | | | | | | | | | | | | | | This provides a FLAT_PLAT_INIT() arch hook for platforms that need to set up specific register state prior to calling in to the process, as per ELF_PLAT_INIT(). Signed-off-by: Takashi YOSHII <yoshii.takashi@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | vfat: fix 'sync' mount deadlock due to BKL->lock_super conversionLinus Torvalds2008-08-201-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was another FAT BKL conversion deadlock reported by Bart Trojanowski due to the BKL being used as a recursive lock by FAT, which was missed because it only triggers with 'sync' (or 'dirsync') mounts. The recursion worked for the BKL, but after the conversion to lock_super (which uses a mutex), it just deadlocks. Thanks to Bart for debugging this and testing the fix. The lock debugging information from the original report: ============================================= [ INFO: possible recursive locking detected ] 2.6.27-rc3-bisect-00448-ga7f5aaf #16 --------------------------------------------- mv/4020 is trying to acquire lock: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20 but task is already holding lock: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20 other info that might help us debug this: 3 locks held by mv/4020: #0: (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140 #1: (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110 #2: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20 stack backtrace: Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16 [<c014e694>] validate_chain+0x984/0xea0 [<c0108d70>] ? native_sched_clock+0x0/0xf0 [<c014ee9c>] __lock_acquire+0x2ec/0x9b0 [<c014f5cf>] lock_acquire+0x6f/0x90 [<c01a90fe>] ? lock_super+0x1e/0x20 [<c044e5fd>] mutex_lock_nested+0xad/0x300 [<c01a90fe>] ? lock_super+0x1e/0x20 [<c01a90fe>] ? lock_super+0x1e/0x20 [<c01a90fe>] lock_super+0x1e/0x20 [<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat] [<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80 [<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat] [<f8b3a962>] fat_sync_inode+0x12/0x20 [fat] [<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat] [<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat] [<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat] [<c01b0968>] vfs_unlink+0x98/0x110 [<c01b2400>] do_unlinkat+0x130/0x140 [<c016a8f5>] ? audit_syscall_entry+0x105/0x150 [<c01b253b>] sys_unlinkat+0x3b/0x40 [<c01040d3>] sysenter_do_call+0x12/0x3f ======================= where the deadlock is due to the nesting of lock_super from vfat_unlink to fat_write_inode: - do_unlinkat - vfs_unlink - vfat_unlink * lock_super - fat_remove_entries - fat_sync_inode - fat_write_inode * lock_super and the fix is to simply remove the use of lock_super() in fat_write_inode. The lock_super() there had been just an automatic conversion of the kernel lock to the superblock lock, but no locking was actually needed there, since the code in fat_write_inode already protected all relevant accesses with a spinlock (sbi->inode_hash_lock to be exact). The only code inside the BKL (and thus the superblock lock) was accesses tp local variables or calls to functions that have long been SMP-safe (i.e. sb_bread, mark_buffe_dirty and brlese). Bart reports: "Looks good. I ran 10 parallel processes creating 1M files truncating them, writing to them again and then deleting them. This patch fixes the issue I ran into. Signed-off-by: Bart Trojanowski <bart@jukie.net>" Reported-and-tested-by: Bart Trojanowski <bart@jukie.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2008-08-152-0/+3
|\ \ | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] mount of IPC$ breaks with iget patch [CIFS] remove trailing whitespace [CIFS] if get root inode fails during mount, cleanup tree connection
| * | [CIFS] mount of IPC$ breaks with iget patchSteve French2008-08-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In looking at network named pipe support on cifs, I noticed that Dave Howell's iget patch: iget: stop CIFS from using iget() and read_inode() broke mounts to IPC$ (the interprocess communication share), and don't handle the error case (when getting info on the root inode fails). Thanks to Gunter who noted a typo in a debug line in the original version of this patch. CC: David Howells <dhowells@redhat.com> CC: Gunter Kukkukk <linux@kukkukk.com> CC: Stable Kernel <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | [CIFS] remove trailing whitespaceSteve French2008-08-111-1/+1
| | | | | | | | | | | | Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | [CIFS] if get root inode fails during mount, cleanup tree connectionSteve French2008-08-111-0/+2
| |/ | | | | | | | | CC: Stable Kernel <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6Linus Torvalds2008-08-1517-243/+328
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6: (29 commits) UBIFS: xattr bugfixes UBIFS: remove unneeded check UBIFS: few commentary fixes UBIFS: fix budgeting request alignment in xattr code UBIFS: improve arguments checking in debugging messages UBIFS: always set i_generation to 0 UBIFS: correct spelling of "thrice". UBIFS: support splice_write UBIFS: minor tweaks in commit UBIFS: reserve more space for index UBIFS: print pid in dump function UBIFS: align inode data to eight UBIFS: improve budgeting checks UBIFS: correct orphan deletion order UBIFS: fix typos in comments UBIFS: do not union creat_sqnum and del_cmtno UBIFS: optimize deletions UBIFS: increment commit number earlier UBIFS: remove another unneeded function parameter UBIFS: remove unneeded function parameter ...
| * | UBIFS: xattr bugfixesArtem Bityutskiy2008-08-142-28/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xattr code has not been tested for a while and there were serveral bugs. One of them is using wrong inode in 'ubifs_jnl_change_xattr()'. The other is a deadlock in 'ubifs_setxattr()': the i_mutex is locked in 'cap_inode_need_killpriv()' path, so deadlock happens when 'ubifs_setxattr()' tries to lock it again. Thanks to Zoltan Sogor for finding these bugs. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: remove unneeded checkArtem Bityutskiy2008-08-131-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit d70b67c8bc72ee23b55381bd6a884f4796692f77 fixed VFS and it never calls FS lookup function in deleted directories now. We may remove corresponding UBIFS check. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: few commentary fixesArtem Bityutskiy2008-08-132-6/+4
| | | | | | | | | | | | Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: fix budgeting request alignment in xattr codeZoltan Sogor2008-08-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | Data length has to be aligned in the budgeting request. Code in xattr.c did not do this. Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: improve arguments checking in debugging messagesArtem Bityutskiy2008-08-131-74/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use "if (0) printk()" construct in debugging print macros to make the debugging messages be checked even if debugging is off. This patch also removes some unneeded spaces and blank lines. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: always set i_generation to 0Adrian Hunter2008-08-133-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | UBIFS does not presently re-use inode numbers, so leaving i_generation zero is most appropriate for now. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: correct spelling of "thrice".Adrian Hunter2008-08-132-3/+3
| | | | | | | | | | | | Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
| * | UBIFS: support splice_writeZoltan Sogor2008-08-131-0/+1
| | | | | | | | | | | | | | | Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: minor tweaks in commitArtem Bityutskiy2008-08-131-19/+18
| | | | | | | | | | | | | | | | | | No functional changes, just lessen the amount of indentations. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: reserve more space for indexArtem Bityutskiy2008-08-134-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment UBIFS reserves twice old index size space for the index. But this is not enough in some cases, because if the indexing node are very fragmented and there are many small gaps, while the dirty index has big znodes - in-the-gaps method would fail. Thus, reserve trise as more, in which case we are guaranteed that we can commit in any case. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: print pid in dump functionArtem Bityutskiy2008-08-131-10/+10
| | | | | | | | | | | | | | | | | | | | | Useful when something fails and there are many processes racing. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: align inode data to eightArtem Bityutskiy2008-08-135-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS aligns node lengths to 8, so budgeting has to do the same. Well, direntry, inode, and page budgets are already aligned, but not inode data budget (e.g., data in special devices or symlinks). Do this for inode data as well. Also, add corresponding debugging checks. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: improve budgeting checksArtem Bityutskiy2008-08-132-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | Budgeting is a crucial UBIFS subsystem - add more assertions to improve requests checking. This is not compiled in when UBIFS debugging is disabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: correct orphan deletion orderAdrian Hunter2008-08-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The debug function that checks orphans, does so using the TNC mutex. That means it will not see a correct picture if the inode is removed from the orphan tree before it is removed from TNC. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
| * | UBIFS: fix typos in commentsAdrian Hunter2008-08-131-5/+5
| | | | | | | | | | | | Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
| * | UBIFS: do not union creat_sqnum and del_cmtnoAdrian Hunter2008-08-131-4/+2
| | | | | | | | | | | | | | | | | | | | | The values in these two fields need to be preserved independently and so a union cannot be used. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
| * | UBIFS: optimize deletionsArtem Bityutskiy2008-08-133-5/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every time anything is deleted, UBIFS writes the deletion inode node twice - once in 'ubifs_jnl_update()' and the second time in 'ubifs_jnl_write_inode()'. However, the second write is not needed if no commit happened after 'ubifs_jnl_update()'. This patch checks that condition and avoids writing the deletion inode for the second time. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: increment commit number earlierArtem Bityutskiy2008-08-134-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | Increment the commit number at the beginnig of the commit, instead of doing this after the commit. This is needed for further optimizations. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: remove another unneeded function parameterArtem Bityutskiy2008-08-131-16/+14
| | | | | | | | | | | | | | | | | | | | | The 'last_reference' parameter of 'pack_inode()' is not really needed because 'inode->i_nlink' may be tested instead. Zap it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: remove unneeded function parameterArtem Bityutskiy2008-08-133-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | Simplify 'ubifs_jnl_write_inode()' by removing the 'deletion' parameter which is not really needed because we may test inode->i_nlink and check whether this is a deletion or not. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: do not write orphans backArtem Bityutskiy2008-08-131-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Orphan inodes are deleted inodes which will disappear after FS re-mount. There is not need to write orphan inodes back, because they are not needed on the flash media. So optimize orphans a little by not writing them back. Just mark them as clean, free the budget, and report success to VFS. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: make ubifs_ro_mode() not inlineAdrian Hunter2008-08-133-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use ubifs_ro_mode() quite a lot, and not in fast-path, so there is no reason to blow the code up by having it inlined. Also, we usually want R/O mode change to be seen to other CPUs as soon as possible, so when we make this a function call, we will automatically have a memory barrier. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: ensure UBIFS switches to read-only on errorAdrian Hunter2008-08-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBI transparently handles write errors by automatically copying and remapping the affected eraseblock. If UBI is unable to do that, for example its pool of eraseblocks reserved for bad block handling is empty, then the error is propagated to UBIFS. UBIFS must protect the media from falling into an inconsistent state by immediately switching to read-only mode. In the case of log updates, this was not being done. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
| * | UBIFS: fix error return in failure modeAdrian Hunter2008-08-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS recovery testing debug facility simulates media failures. When simulating an IO error, the error code returned must be -EIO but it was not always if the user switched off the debug recovery testing option at the same time. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
| * | UBIFS: free budget in delete_inode as wellArtem Bityutskiy2008-08-131-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | Although the inode is marked as clean when it is being deleted, it might stay and be used as orphan, and be marked as dirty. So we have to free the budget when we delete it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: improve debuggingArtem Bityutskiy2008-08-132-3/+6
| | | | | | | | | | | | | | | | | | | | | 1. Print inode mode in some of debugging messages 2. Add few more useful assertions Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: fix budgeting calculationsArtem Bityutskiy2008-08-132-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'ubifs_release_dirty_inode_budget()' was buggy and incorrectly freed the budget, which led to not freeing all dirty data budget. This patch fixes that. Also, this patch fixes ubifs_mkdir() which passed 1 in dirty_ino_d, which makes no sense. Well, it is harmless though. Also, add few more useful assertions. And improve few debugging messages. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: print volume name as wellArtem Bityutskiy2008-08-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | We encouredge people to mount using volume name, not device numbers. So print the name of the mounted UBI volume, not just IDs. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* | | omfs: fix oops when file metadata is corruptedBob Copeland2008-08-152-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A fuzzed fileystem image failed with OMFS when the extent count was used in a loop without being checked against the max number of extents. It also provoked a signed division for an array index that was checked as if unsigned, leading to index by -1. omfsck will be updated to fix these cases, in the meantime bail out gracefully. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | omfs: fix potential oops when directory size is corruptedBob Copeland2008-08-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing with a modified fsfuzzer reveals a couple of locations in omfs where filesystem variables are ultimately used as loop counters with insufficient sanity checking. In this case, dir->i_size is used to compute the number of buckets in the directory hash. If too large, readdir will overrun a buffer. Since it's an invariant that dir->i_size is equal to the sysblock size, and we already sanity check that, just use that value instead. This fixes the following oops: BUG: unable to handle kernel paging request at c978e004 IP: [<c032298e>] omfs_readdir+0x18e/0x32f Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC Modules linked in: Pid: 4796, comm: ls Not tainted (2.6.27-rc2 #12) EIP: 0060:[<c032298e>] EFLAGS: 00010287 CPU: 0 EIP is at omfs_readdir+0x18e/0x32f EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100 ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process ls (pid: 4796, ti=cb2d3000 task=cb175f40 task.ti=cb2d3000) Stack: 00000002 00000000 00000000 c018a820 cb2d3f94 cb2cf100 cbfb0000 ffffff10 cbfb3b80 cbfcfaf8 000001c9 00000a09 00000000 00000000 00000000 cbfcfbc8 c9697000 cbfb3b80 22222222 00001000 c08e6cd0 cb2cf100 cbfb3b80 cb2d3f88 Call Trace: [<c018a820>] ? filldir64+0x0/0xcd [<c018a9f2>] ? vfs_readdir+0x56/0x82 [<c018a820>] ? filldir64+0x0/0xcd [<c018aa7c>] ? sys_getdents64+0x5e/0xa0 [<c01038bd>] ? sysenter_do_call+0x12/0x31 ======================= Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d 14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b 41 18 <8b> 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9 Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | fs/inode.c: properly init address_space->writeback_indexChris Mason2008-08-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_cache_pages() uses i_mapping->writeback_index to pick up where it left off the last time a given inode was found by pdflush or balance_dirty_pages (or anyone else who sets wbc->range_cyclic) alloc_inode() should set it to a sane value so that writeback doesn't start in the middle of a file. It is somewhat difficult to notice the bug since write_cache_pages will loop around to the start of the file and the elevator helps hide the resulting seeks. For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of the linux kernel in series runs at 47MB/s on a single sata drive. With this fix, it jumps to 62MB/s. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud