summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] swsusp: Support i386 systems with PAE or without PSERafael J. Wysocki2006-12-071-1/+1
| | | | | | | | | | | | | | | | Make swsusp support i386 systems with PAE or without PSE. This is done by creating temporary page tables located in resume-safe page frames before the suspend image is restored in the same way as x86_64 does it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@redhat.com> Cc: Nigel Cunningham <ncunningham@linuxmail.org> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: thaw userspace and kernel space separatelyNigel Cunningham2006-12-071-7/+18
| | | | | | | | | | | | | Modify process thawing so that we can thaw kernel space without thawing userspace, and thaw kernelspace first. This will be useful in later patches, where I intend to get swsusp thawing kernel threads only before seeking to free memory. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: clean up whitespace in freezer outputNigel Cunningham2006-12-071-8/+8
| | | | | | | | | | Minor whitespace and formatting modifications for the freezer. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: quieten Freezer if !CONFIG_PM_DEBUGNigel Cunningham2006-12-071-1/+0
| | | | | | | | | | | | | | | The freezer currently prints an '=' for every process that is frozen. This is pretty pointless, as the equals sign says nothing about which process is frozen, and makes logs look messier (especially if there were a large number of processes running). All we really need to know is that we started trying to freeze processes and what processes (if any) failed to freeze, or that we succeeded. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add include/linux/freezer.h and move definitions from sched.hNigel Cunningham2006-12-078-1/+8
| | | | | | | | | | | | | Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: fix platform modeStefan Seyfried2006-12-071-0/+22
| | | | | | | | | | | | At some point after 2.6.13, in-kernel software suspend got "incomplete" for the so-called "platform" mode. pm_ops->prepare() is never called. A visible sign of this is the "moon" light on thinkpads not flashing during suspend. Fix by readding the pm_ops->prepare call during suspend. Signed-off-by: Stefan Seyfried <seife@suse.de> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: use __GFP_WAITRafael J. Wysocki2006-12-071-3/+3
| | | | | | | | | | swsusp uses GFP_ATOMIC, but it can afford to use __GFP_WAIT, which will permit it to reclaim clean pagecache instead of emitting scary page-allocation-failure messages. Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: Improve handling of highmemRafael J. Wysocki2006-12-075-239/+671
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently swsusp saves the contents of highmem pages by copying them to the normal zone which is quite inefficient (eg. it requires two normal pages to be used for saving one highmem page). This may be improved by using highmem for saving the contents of saveable highmem pages. Namely, during the suspend phase of the suspend-resume cycle we try to allocate as many free highmem pages as there are saveable highmem pages. If there are not enough highmem image pages to store the contents of all of the saveable highmem pages, some of them will be stored in the "normal" memory. Next, we allocate as many free "normal" pages as needed to store the (remaining) image data. We use a memory bitmap to mark the allocated free pages (ie. highmem as well as "normal" image pages). Now, we use another memory bitmap to mark all of the saveable pages (highmem as well as "normal") and the contents of the saveable pages are copied into the image pages. Then, the second bitmap is used to save the pfns corresponding to the saveable pages and the first one is used to save their data. During the resume phase the pfns of the pages that were saveable during the suspend are loaded from the image and used to mark the "unsafe" page frames. Next, we try to allocate as many free highmem page frames as to load all of the image data that had been in the highmem before the suspend and we allocate so many free "normal" page frames that the total number of allocated free pages (highmem and "normal") is equal to the size of the image. While doing this we have to make sure that there will be some extra free "normal" and "safe" page frames for two lists of PBEs constructed later. Now, the image data are loaded, if possible, into their "original" page frames. The image data that cannot be written into their "original" page frames are loaded into "safe" page frames and their "original" kernel virtual addresses, as well as the addresses of the "safe" pages containing their copies, are stored in one of two lists of PBEs. One list of PBEs is for the copies of "normal" suspend pages (ie. "normal" pages that were saveable during the suspend) and it is used in the same way as previously (ie. by the architecture-dependent parts of swsusp). The other list of PBEs is for the copies of highmem suspend pages. The pages in this list are restored (in a reversible way) right before the arch-dependent code is called. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: add ioctl for swap files supportRafael J. Wysocki2006-12-072-1/+44
| | | | | | | | | | | | | | | | To be able to use swap files as suspend storage from the userland suspend tools we need an additional ioctl() that will allow us to provide the kernel with both the swap header's offset and the identification of the resume partition. The new ioctl() should be regarded as a replacement for the SNAPSHOT_SET_SWAP_FILE ioctl() that from now on will be considered as obsolete, but has to stay for backwards compatibility of the interface. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: add resume_offset command line parameterRafael J. Wysocki2006-12-073-5/+26
| | | | | | | | | | | | | | | Add the kernel command line parameter "resume_offset=" allowing us to specify the offset, in <PAGE_SIZE> units, from the beginning of the partition pointed to by the "resume=" parameter at which the swap header is located. This offset can be determined, for example, by an application using the FIBMAP ioctl to obtain the swap header's block number for given file. [akpm@osdl.org: we don't know what type sector_t is] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: use block device offsets to identify swap locationsRafael J. Wysocki2006-12-074-72/+80
| | | | | | | | | | | | | | Make swsusp use block device offsets instead of swap offsets to identify swap locations and make it use the same code paths for writing as well as for reading data. This allows us to use the same code for handling swap files and swap partitions and to simplify the code, eg. by dropping rw_swap_page_sync(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: rearrange swap-handling codeRafael J. Wysocki2006-12-071-109/+112
| | | | | | | | | | | | Rearrange the code in kernel/power/swap.c so that the next patch is more readable. [This patch only moves the existing code.] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: use partition device and offset to identify swap areasRafael J. Wysocki2006-12-072-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux kernel handles swap files almost in the same way as it handles swap partitions and there are only two differences between these two types of swap areas: (1) swap files need not be contiguous, (2) the header of a swap file is not in the first block of the partition that holds it. From the swsusp's point of view (1) is not a problem, because it is already taken care of by the swap-handling code, but (2) has to be taken into consideration. In principle the location of a swap file's header may be determined with the help of appropriate filesystem driver. Unfortunately, however, it requires the filesystem holding the swap file to be mounted, and if this filesystem is journaled, it cannot be mounted during a resume from disk. For this reason we need some other means by which swap areas can be identified. For example, to identify a swap area we can use the partition that holds the area and the offset from the beginning of this partition at which the swap header is located. The following patch allows swsusp to identify swap areas this way. It changes swap_type_of() so that it takes an additional argument representing an offset of the swap header within the partition represented by its first argument. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] uswsusp: add pmops->{prepare,enter,finish} support (aka "platform mode")Stefan Seyfried2006-12-072-1/+34
| | | | | | | | | | | | | | | | | | | Add an ioctl to the userspace swsusp code that enables the usage of the pmops->prepare, pmops->enter and pmops->finish methods (the in-kernel suspend knows these as "platform method"). These are needed on many machines to (among others) speed up resuming by letting the BIOS skip some steps or let my hp nx5000 recognise the correct ac_adapter state after resume again. It also ensures on many machines, that changed hardware (unplugged AC adapters) gets correctly detected and that kacpid does not run wild after resume. Signed-off-by: Stefan Seyfried <seife@suse.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove kmem_cache_tChristoph Lameter2006-12-077-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove SLAB_KERNELChristoph Lameter2006-12-074-6/+6
| | | | | | | | SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: pagefault_{disable,enable}()Peter Zijlstra2006-12-071-14/+14
| | | | | | | | | | | | | | | | | | | | Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] new scheme to preempt swap tokenAshwin Chaugule2006-12-072-11/+8
| | | | | | | | | | | | | | | | | | | | The new swap token patches replace the current token traversal algo. The old algo had a crude timeout parameter that was used to handover the token from one task to another. This algo, transfers the token to the tasks that are in need of the token. The urgency for the token is based on the number of times a task is required to swap-in pages. Accordingly, the priority of a task is incremented if it has been badly affected due to swap-outs. To ensure that the token doesnt bounce around rapidly, the token holders are given a priority boost. The priority of tasks is also decremented, if their rate of swap-in's keeps reducing. This way, the condition to check whether to pre-empt the swap token, is a matter of comparing two task's priority fields. [akpm@osdl.org: cleanups] Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'master' of ↵David Howells2006-12-053-1/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>
| * [PATCH] severing skbuff.h -> highmem.hAl Viro2006-12-041-0/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] severing module.h->sched.hAl Viro2006-12-042-1/+15
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'master' of ↵David Howells2006-12-058-32/+63
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>
| * [GENL]: Add genlmsg_put_reply() to simplify building reply headersThomas Graf2006-12-021-6/+2
| | | | | | | | | | | | | | | | | | | | By modyfing genlmsg_put() to take a genl_family and by adding genlmsg_put_reply() the process of constructing the netlink and generic netlink headers is simplified. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [GENL]: Add genlmsg_new() to allocate generic netlink messagesThomas Graf2006-12-021-1/+1
| | | | | | | | | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETLINK]: Do precise netlink message allocations where possibleThomas Graf2006-12-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Driver core: show drivers in /sys/module/Kay Sievers2006-12-011-6/+25
| | | | | | | | | | | | | | | | | | | | | | Show the drivers, which belong to the module: $ ls -l /sys/module/usbcore/drivers/ hub -> ../../../bus/usb/drivers/hub usb -> ../../../bus/usb/drivers/usb usbfs -> ../../../bus/usb/drivers/usbfs Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds2006-11-281-4/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: [PATCH] x86-64: Use stricter in process stack check for unwinder [PATCH] i386: Fix compilation with UP genericarch [PATCH] x86-64: Fix warning in io_apic.c [PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
| | * [PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwindJan Beulich2006-11-281-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a problem with gcc4 mis-compiling the stack unwind code under -Os, which resulted in 'stuck' messages whenever an assembly routine was encountered. (The second hunk is trivial cleanup.) Signed-off-by: Jan Beulich <jbeulich@novell.com>
| * | [PATCH] fix create_write_pipe() error checkAkinobu Mita2006-11-281-4/+4
| |/ | | | | | | | | | | | | | | | | The return value of create_write_pipe()/create_read_pipe() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] lockdep: spin_lock_irqsave_nested()Arjan van de Ven2006-11-251-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce spin_lock_irqsave_nested(); implementation from: http://lkml.org/lkml/2006/6/1/122 Patch from: http://lkml.org/lkml/2006/9/13/258 [akpm@osdl.org: two compile fixes] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jiri Kosina <jikos@jikos.cz> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] fix copy_process() error checkAkinobu Mita2006-11-251-3/+2
| | | | | | | | | | | | | | | | The return value of copy_process() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * Don't call "note_interrupt()" with irq descriptor lock heldLinus Torvalds2006-11-222-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit f72fa707604c015a6625e80f269506032d5430dc, and solves the problem that it tried to fix by simply making "__do_IRQ()" call the note_interrupt() function without the lock held, the way everybody else does. It should be noted that all interrupt handling code must never allow the descriptor actors to be entered "recursively" (that's why we do all the magic IRQ_PENDING stuff in the first place), so there actually is exclusion at that much higher level, even in the absense of locking. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Acked-by:Pavel Emelianov <xemul@openvz.org> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | WorkStruct: make allyesconfigDavid Howells2006-11-221-4/+6
| | | | | | | | | | | | Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
* | WorkStruct: Pass the work_struct pointer instead of context dataDavid Howells2006-11-225-26/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>
* | WorkStruct: Merge the pending bit into the wq_data pointerDavid Howells2006-11-221-9/+32
| | | | | | | | | | | | | | | | Reclaim a word from the size of the work_struct by folding the pending bit and the wq_data pointer together. This shouldn't cause misalignment problems as all pointers should be at least 4-byte aligned. Signed-Off-By: David Howells <dhowells@redhat.com>
* | WorkStruct: Typedef the work function prototypeDavid Howells2006-11-221-3/+3
| | | | | | | | | | | | | | | | | | Define a type for the work function prototype. It's not only kept in the work_struct struct, it's also passed as an argument to several functions. This makes it easier to change it. Signed-Off-By: David Howells <dhowells@redhat.com>
* | WorkStruct: Separate delayable and non-delayable events.David Howells2006-11-221-23/+28
|/ | | | | | | | | | | | Separate delayable work items from non-delayable work items be splitting them into a separate structure (delayed_work), which incorporates a work_struct and the timer_list removed from work_struct. The work_struct struct is huge, and this limits it's usefulness. On a 64-bit architecture it's nearly 100 bytes in size. This reduces that by half for the non-delayable type of event. Signed-Off-By: David Howells <dhowells@redhat.com>
* [PATCH] lockdep: fix static keys in module-allocated percpu areasIngo Molnar2006-11-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep got confused by certain locks in modules: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Call Trace: [<ffffffff8026f40d>] dump_trace+0xaa/0x3f2 [<ffffffff8026f78f>] show_trace+0x3a/0x60 [<ffffffff8026f9d1>] dump_stack+0x15/0x17 [<ffffffff802abfe8>] __lock_acquire+0x724/0x9bb [<ffffffff802ac52b>] lock_acquire+0x4d/0x67 [<ffffffff80267139>] rt_spin_lock+0x3d/0x41 [<ffffffff8839ed3f>] :ip_conntrack:__ip_ct_refresh_acct+0x131/0x174 [<ffffffff883a1334>] :ip_conntrack:udp_packet+0xbf/0xcf [<ffffffff8839f9af>] :ip_conntrack:ip_conntrack_in+0x394/0x4a7 [<ffffffff8023551f>] nf_iterate+0x41/0x7f [<ffffffff8025946a>] nf_hook_slow+0x64/0xd5 [<ffffffff802369a2>] ip_rcv+0x24e/0x506 [...] Steven Rostedt found the bug: static_obj() check did not take PERCPU_ENOUGH_ROOM into account, so in-module DEFINE_PER_CPU-area locks were triggering this message. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] some irq_chip variables point to NULLZhang, Yanmin2006-11-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine. Below is the log. Oops 11012296146944 [1] Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal processor f an container button sg eepro100 e100 mii Pid: 0, CPU 0, comm: swapper psr : 0000121008022038 ifs : 800000000000040b ip : [<a0000001000e1411>] Not tainted ip is at __do_IRQ+0x371/0x3e0 unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 rnat: 656960155aa56aa5 bsps: a00000010058b890 pr : 656960155aa55a65 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e1390 b6 : a0000001005beac0 b7 : e00000007f01aa00 f6 : 000000000000000000000 f7 : 0ffe69090000000000000 f8 : 1000a9090000000000000 f9 : 0ffff8000000000000000 f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909 r1 : a000000100fbbff0 r2 : 0000000000010002 r3 : 0000000000010001 r8 : fffffffffffbffff r9 : a000000100bd8060 r10 : a000000100dd83b8 r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000 r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8 r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300 r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0 r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000 r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0 Call Trace: [<a000000100014600>] show_stack+0x40/0xa0 sp=a000000100bcb760 bsp=a000000100bc4f40 [<a000000100014f00>] show_regs+0x840/0x880 sp=a000000100bcb930 bsp=a000000100bc4ee8 [<a000000100037fb0>] die+0x250/0x320 sp=a000000100bcb930 bsp=a000000100bc4ea0 [<a00000010005e5f0>] ia64_do_page_fault+0x8d0/0xa20 sp=a000000100bcb950 bsp=a000000100bc4e50 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcb9e0 bsp=a000000100bc4e50 [<a0000001000e1410>] __do_IRQ+0x370/0x3e0 sp=a000000100bcbbb0 bsp=a000000100bc4df0 [<a000000100011f50>] ia64_handle_irq+0x170/0x220 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a000000100012390>] ia64_pal_call_static+0x90/0xc0 sp=a000000100bcbd80 bsp=a000000100bc4d78 [<a000000100015630>] default_idle+0x90/0x160 sp=a000000100bcbd80 bsp=a000000100bc4d58 [<a000000100014290>] cpu_idle+0x1f0/0x440 sp=a000000100bcbe20 bsp=a000000100bc4d18 [<a000000100009980>] rest_init+0xc0/0xe0 sp=a000000100bcbe20 bsp=a000000100bc4d00 [<a0000001009f8ea0>] start_kernel+0x6a0/0x6c0 sp=a000000100bcbe20 bsp=a000000100bc4ca0 [<a0000001000089f0>] __end_ivt_text+0x6d0/0x6f0 sp=a000000100bcbe30 bsp=a000000100bc4c00 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! The root cause is that some irq_chip variables, especially ia64_msi_chip, initiate their memeber end to point to NULL. __do_IRQ doesn't check if irq_chip->end is null and just calls it after processing the interrupt. As irq_chip->end is called at many places, so I fix it by reinitiating irq_chip->end to dummy_irq_chip.end, e.g., a noop function. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Revert "[PATCH] fix Data Acess error in dup_fd"Linus Torvalds2006-11-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 0130b0b32ee53dc7add773fcea984f6a26ef1da3. Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was supposed to fix must be some unrelated memory corruption, and the "fix" actually causes more problems: "However, the new code does not look safe in all cases. If some other task has opened more files while dup_fd() released oldf->file_lock, the new code will update open_files to the new larger value. But newf was allocated with the old smaller value of open_files, therefore subsequent accesses to newf may try to write into unallocated memory." so revert it. Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Sergey Vlasov <vsu@altlinux.ru> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] setup_irq(): better mismatch debuggingAndrew Morton2006-11-141-2/+7
| | | | | | | | | | | When we get a mismatch between handlers on the same IRQ, all we get is "IRQ handler type mismatch for IRQ n". Let's print the name of the presently-registered handler with which we got the mismatch. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix misrouted interrupts deadlocksPavel Emelianov2006-11-131-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing kernel on machine with "irqpoll" option I've caught such a lockup: __do_IRQ() spin_lock(&desc->lock); desc->chip->ack(); /* IRQ is ACKed */ note_interrupt() misrouted_irq() handle_IRQ_event() if (...) local_irq_enable_in_hardirq(); /* interrupts are enabled from now */ ... __do_IRQ() /* same IRQ we've started from */ spin_lock(&desc->lock); /* LOCKUP */ Looking at misrouted_irq() code I've found that a potential deadlock like this can also take place: 1CPU: __do_IRQ() spin_lock(&desc->lock); /* irq = A */ misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); /* irq = B */ if (desc->status & IRQ_INPROGRESS) { 2CPU: __do_IRQ() spin_lock(&desc->lock); /* irq = B */ misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); /* irq = A */ if (desc->status & IRQ_INPROGRESS) { As the second lock on both CPUs is taken before checking that this irq is being handled in another processor this may cause a deadlock. This issue is only theoretical. I propose the attached patch to fix booth problems: when trying to handle misrouted IRQ active desc->lock may be unlocked. Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fix Data Acess error in dup_fdSharyathi Nagesh2006-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On running the Stress Test on machine for more than 72 hours following error message was observed. 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0] pc: c000000000060d90: .dup_fd+0x240/0x39c lr: c000000000060d6c: .dup_fd+0x21c/0x39c sp: c00000007ce2fa70 msr: 800000000000b032 dar: ffffffff00000028 dsisr: 40000000 current = 0xc000000074950980 paca = 0xc000000000454500 pid = 27330, comm = bash 0:mon> t [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable) [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88 [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520 [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4 [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74 [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc The problem is because of race window. When if(expand) block is executed in dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be modified. So actual open_files in oldf may not match with open_files variable. Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sysctl: allow a zero ctl_name in the middle of a sysctl tableEric W. Biederman2006-11-061-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since it is becoming clear that there are just enough users of the binary sysctl interface that completely removing the binary interface from the kernel will not be an option for foreseeable future, we need to find a way to address the sysctl maintenance issues. The basic problem is that sysctl requires one central authority to allocate sysctl numbers, or else conflicts and ABI breakage occur. The proc interface to sysctl does not have that problem, as names are not densely allocated. By not terminating a sysctl table until I have neither a ctl_name nor a procname, it becomes simple to add sysctl entries that don't show up in the binary sysctl interface. Which allows people to avoid allocating a binary sysctl value when not needed. I have audited the kernel code and in my reading I have not found a single sysctl table that wasn't terminated by a completely zero filled entry. So this change in behavior should not affect anything. I think this mechanism eases the pain enough that combined with a little disciple we can solve the reoccurring sysctl ABI breakage. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Improve the removed sysctl warningsEric W. Biederman2006-11-061-1/+21
| | | | | | | | | | | | | Don't warn about libpthread's access to kernel.version. When it receives -ENOSYS it will read /proc/sys/kernel/version. If anything else shows up print the sysctl number string. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cal Peake <cp@absolutedigital.net> Cc: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] lockdep: fix delayacct locking bugPeter Zijlstra2006-11-061-6/+9
| | | | | | | | | | | | | Make the delayacct lock irqsave; this avoids the possible deadlock where an interrupt is taken while holding the delayacct lock which needs to take the delayacct lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix the spurious unlock_cpu_hotplug false warningsGautham R Shenoy2006-11-061-1/+1
| | | | | | | | | | | | | | | | | | | Cpu-hotplug locking has a minor race case caused because of setting the variable "recursive" to NULL *after* releasing the cpu_bitmask_lock in the function unlock_cpu_hotplug,instead of doing so before releasing the cpu_bitmask_lock. This was the cause of most of the recent false spurious lock_cpu_unlock warnings. This should fix the problem reported by Martin Lorenz reported in http://lkml.org/lkml/2006/10/29/127. Thanks to Srinivasa DS for pointing it out. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Make sure "user->sigpending" count is in syncLinus Torvalds2006-11-041-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | The previous commit (45c18b0bb579b5c1b89f8c99f1b6ffa4c586ba08, aka "Fix unlikely (but possible) race condition on task->user access") fixed a potential oops due to __sigqueue_alloc() getting its "user" pointer out of sync with switch_user(), and accessing a user pointer that had been de-allocated on another CPU. It still left another (much less serious) problem, where a concurrent __sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal pending reference counting for a _different_ user than the one it then actually ended up using. No oops, but we'd end up with the wrong signal accounting. Another case of Oleg's eagle-eyes picking up the problem. This is trivially fixed by just making sure we load whichever "user" structure we decide to use (it doesn't matter _which_ one we pick, we just need to pick one) just once. Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Fix unlikely (but possible) race condition on task->user accessLinus Torvalds2006-11-041-0/+11
| | | | | | | | | | | | | | | | | | | | | | There's a possible race condition when doing a "switch_uid()" from one user to another, which could race with another thread doing a signal allocation and looking at the old thread ->user pointer as it is freed. This explains an oops reported by Lukasz Trabinski: http://permalink.gmane.org/gmane.linux.kernel/462241 We fix this by delaying the (reference-counted) freeing of the user structure until the thread signal handler lock has been released, so that we know that the signal allocation has either seen the new value or has properly incremented the reference count of the old one. Race identified by Oleg Nesterov. Cc: Lukasz Trabinski <lukasz@wsisiz.edu.pl> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Create compat_sys_migrate_pagesStephen Rothwell2006-11-032-0/+34
| | | | | | | | | This is needed on bigendian 64bit architectures. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OpenPOWER on IntegriCloud