summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel
Commit message (Collapse)AuthorAgeFilesLines
* KVM: PPC: Introduce kvm_tmp frameworkAlexander Graf2010-10-241-2/+40
| | | | | | | | | | | | | | | | We will soon require more sophisticated methods to replace single instructions with multiple instructions. We do that by branching to a memory region where we write replacement code for the instruction to. This region needs to be within 32 MB of the patched instruction though, because that's the furthest we can jump with immediate branches. So we keep 1MB of free space around in bss. After we're done initing we can just tell the mm system that the unused pages are free, but until then we have enough space to fit all our code in. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: PV tlbsync to nopAlexander Graf2010-10-241-0/+12
| | | | | | | | With our current MMU scheme we don't need to know about the tlbsync instruction. So we can just nop it out. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: PV instructions to loads and storesAlexander Graf2010-10-241-0/+109
| | | | | | | | | | Some instructions can simply be replaced by load and store instructions to or from the magic page. This patch replaces often called instructions that fall into the above category. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: KVM PV guest stubsAlexander Graf2010-10-241-0/+95
| | | | | | | | | | | | We will soon start and replace instructions from the text section with other, paravirtualized versions. To ease the readability of those patches I split out the generic looping and magic page mapping code out. This patch still only contains stubs. But at least it loops through the text section :). Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: Generic KVM PV guest supportAlexander Graf2010-10-244-1/+55
| | | | | | | | | | | We have all the hypervisor pieces in place now, but the guest parts are still missing. This patch implements basic awareness of KVM when running Linux as guest. It doesn't do anything with it yet though. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: Implement hypervisor interfaceAlexander Graf2010-10-242-0/+70
| | | | | | | | | | | | | | To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: Convert MSR to shared pageAlexander Graf2010-10-241-1/+1
| | | | | | | | | | | | | | | One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the "interrupts enabled" flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the documentation for a list of MSR bits that are safe to be set from inside the guest. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: Introduce shared pageAlexander Graf2010-10-241-0/+1
| | | | | | | | | | | | For transparent variable sharing between the hypervisor and guest, I introduce a shared page. This shared page will contain all the registers the guest can read and write safely without exiting guest context. This patch only implements the stubs required for the basic structure of the shared page. The actual register moving follows. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* powerpc: remove unused variableStephen Rothwell2010-10-051-1/+0
| | | | | | | | | | | | Since powerpc uses -Werror on arch powerpc, the build was broken like this: cc1: warnings being treated as errors arch/powerpc/kernel/module.c: In function 'module_finalize': arch/powerpc/kernel/module.c:66: error: unused variable 'err' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* modules: Fix module_bug_list list corruption raceLinus Torvalds2010-10-051-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc: fix double syscall restartsAl Viro2010-09-223-3/+4
| | | | | | | | | | | | | | Make sigreturn zero regs->trap, make do_signal() do the same on all paths. As it is, signal interrupting e.g. read() from fd 512 (== ERESTARTSYS) with another signal getting unblocked when the first handler finishes will lead to restart one insn earlier than it ought to. Same for multiple signals with in-kernel handlers interrupting that sucker at the same time. Same for multiple signals of any kind interrupting that sucker on 64bit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc: Don't use kernel stack with translation offMichael Neuling2010-08-311-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In f761622e59433130bc33ad086ce219feee9eb961 we changed early_setup_secondary so it's called using the proper kernel stack rather than the emergency one. Unfortunately, this stack pointer can't be used when translation is off on PHYP as this stack pointer might be outside the RMO. This results in the following on all non zero cpus: cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10] pc: 000000000001c50c lr: 000000000000821c sp: c00000001639ff90 msr: 8000000000001000 dar: c00000001639ffa0 dsisr: 42000000 current = 0xc000000016393540 paca = 0xc000000006e00200 pid = 0, comm = swapper The original patch was only tested on bare metal system, so it never caught this problem. This changes __secondary_start so that we calculate the new stack pointer but only start using it after we've called early_setup_secondary. With this patch, the above problem goes away. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/perf_event: Reduce latency of calling perf_event_do_pendingPaul Mackerras2010-08-311-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") moved the call to perf_event_do_pending in timer_interrupt() down so that it was after the irq_enter() call. Unfortunately this moved it after the code that checks whether it is time for the next decrementer clock event. The result is that the call to perf_event_do_pending() won't happen until the next decrementer clock event is due. This was pointed out by Milton Miller. This fixes it by moving the check for whether it's time for the next decrementer clock event down to the point where we're about to call the event handler, after we've called perf_event_do_pending. This has the side effect that on old pre-Core99 Powermacs where we use the ppc_n_lost_interrupts mechanism to replay interrupts, a replayed interrupt will incur a little more latency since it will now do the code from the irq_enter down to the irq_exit, that it used to skip. However, these machines are now old and rare enough that this doesn't matter. To make it clear that ppc_n_lost_interrupts is only used on Powermacs, and to speed up the code slightly on non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts is now conditional on CONFIG_PMAC as well as CONFIG_PPC32. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/kexec: Adds correct calling convention for kexec purgatoryMatthew McClintock2010-08-311-0/+3
| | | | | | | | | | | | | | | | | | Call kexec purgatory code correctly. We were getting lucky before. If you examine the powerpc 32bit kexec "purgatory" code you will see it expects the following: >From kexec-tools: purgatory/arch/ppc/v2wrap_32.S -> calling convention: -> r3 = physical number of this cpu (all cpus) -> r4 = address of this chunk (master only) As such, we need to set r3 to the current core, r4 happens to be unused by purgatory at the moment but we go ahead and set it here as well Signed-off-by: Matthew McClintock <msm@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Wire up fanotify_init, fanotify_mark, prlimit64 syscallsAndreas Schwab2010-08-241-0/+8
| | | | | Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/pci: Fix checking for child bridges in PCI code.Grant Likely2010-08-241-1/+2
| | | | | | | | | | pci_device_to_OF_node() can return null, and list_for_each_entry will never enter the loop when dev is NULL, so it looks like this test is a typo. Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Initialise paca->kstack before early_setup_secondaryMatt Evans2010-08-241-3/+3
| | | | | | | | | | | | | | | | | | As early setup calls down to slb_initialize(), we must have kstack initialised before checking "should we add a bolted SLB entry for our kstack?" Failing to do so means stack access requires an SLB miss exception to refill an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text & static data). It's not always allowable to take such a miss, and intermittent crashes will result. Primary CPUs don't have this issue; an SLB entry is not bolted for their stack anyway (as that lives within SLB(0)). This patch therefore only affects the init of secondaries. Signed-off-by: Matt Evans <matt@ozlabs.org> Cc: stable <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Fix bogus it_blocksize in VIO iommu codeAnton Blanchard2010-08-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When looking at some issues with the virtual ethernet driver I noticed that TCE allocation was following a very strange pattern: address 00e9000 length 2048 address 0409000 length 2048 <----- address 0429000 length 2048 address 0449000 length 2048 address 0469000 length 2048 address 0489000 length 2048 address 04a9000 length 2048 address 04c9000 length 2048 address 04e9000 length 2048 address 4009000 length 2048 <----- address 4029000 length 2048 Huge unexplained gaps in what should be an empty TCE table. It turns out it_blocksize, the amount we want to align the next allocation to, was c0000000fe903b20. Completely bogus. Initialise it to something reasonable in the VIO IOMMU code, and use kzalloc everywhere to protect against this when we next add a non compulsary field to iommu code and forget to initialise it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Inline ppc64_runlatch_offAnton Blanchard2010-08-241-8/+6
| | | | | | | | | | I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it into the callers. To avoid a mess of circular includes I didn't add it as an inline function. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Correct smt_enabled=X boot option for > 2 threads per coreNathan Fontenot2010-08-241-27/+36
| | | | | | | | | | | The 'smt_enabled=X' boot option does not handle values of X > 2. For Power 7 processors with smt modes of 0,1,2,3, and 4 this does not work. This patch allows the smt_enabled option to be set to any value limited to a max equal to the number of threads per core. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Silence __cpu_up() under normal operationSigned-off-by: Darren Hart2010-08-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | During CPU offline/online tests __cpu_up would flood the logs with the following message: Processor 0 found. This provides no useful information to the user as there is no context provided, and since the operation was a success (to this point) it is expected that the CPU will come back online, providing all the feedback necessary. Change the "Processor found" message to DBG() similar to other such messages in the same function. Also, add an appropriate log level for the "Processor is stuck" message. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Re-enable preemption before cpu_die()Signed-off-by: Darren Hart2010-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | start_secondary() is called shortly after _start and also via cpu_idle()->cpu_die()->pseries_mach_cpu_die() start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is called via the cpu_idle() routine with preemption disabled, resulting in the following repeating message during rapid cpu offline/online tests with CONFIG_PREEMPT=y: BUG: scheduling while atomic: swapper/0/0x00000002 Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan] Call Trace: [c00000010e7079c0] [c0000000000133ec] .show_stack+0xd8/0x218 (unreliable) [c00000010e707aa0] [c0000000006a47f0] .dump_stack+0x28/0x3c [c00000010e707b20] [c00000000006e7a4] .__schedule_bug+0x7c/0x9c [c00000010e707bb0] [c000000000699d9c] .schedule+0x104/0x800 [c00000010e707cd0] [c000000000015b24] .cpu_idle+0x1c4/0x1d8 [c00000010e707d70] [c0000000006aa1b4] .start_secondary+0x398/0x3d4 [c00000010e707e30] [c000000000008278] .start_secondary_resume+0x10/0x14 Move the cpu_die() call inside the existing preemption enabled block of cpu_idle(). This is safe as the idle task is affined to a single CPU so the debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are in a "migration disabled" region. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/pci: Drop unnecessary null testJulia Lawall2010-08-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | list_for_each_entry binds its first argument to a non-null value, and thus any null test on the value of that argument is superfluous. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ iterator I; expression x,E,E1,E2; statement S,S1,S2; @@ I(x,...) { <... - if (x != NULL || ...) S ...> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/kdump: Stop all other CPUs before running crash handlersAnton Blanchard2010-08-241-11/+13
| | | | | | | | | | | During kdump we run the crash handlers first then stop all other CPUs. We really want to stop all CPUs as close to the fail as possible and also have a very controlled environment for running the crash handlers, so it makes sense to reverse the order. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Use is_32bit_task() helper to test 32 bit binaryDenis Kirjanov2010-08-241-3/+3
| | | | | | | Use is_32bit_task() helper to test 32 bit binary. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge remote branch 'jwb/merge' into mergeBenjamin Herrenschmidt2010-08-244-12/+18
|\
| * powerpc/4xx: Index interrupt stacks by physical cpuDave Kleikamp2010-08-232-11/+14
| | | | | | | | | | | | | | | | | | The interrupt stacks need to be indexed by the physical cpu since the critical, debug and machine check handlers use the contents of SPRN_PIR to index the critirq_ctx, dbgirq_ctx, and mcheckirq_ctx arrays. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * powerpc/47x: Remove redundant line from cputable.cDave Kleikamp2010-08-231-1/+0
| | | | | | | | | | | | | | | | There are two entries for .cpu_user_features in arch/powerpc/kernel/cputable.c. Remove the one that doesn't belong Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * powerpc/47x: Make sure mcsr is cleared before enabling machine check interruptsDave Kleikamp2010-08-231-0/+4
| | | | | | | | | | | | | | | | | | Clear the machine check syndrom register before enabling machine check interrupts. The initial state of the tlb can lead to parity errors being flagged early after a cold boot. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
* | Make do_execve() take a const filename pointerDavid Howells2010-08-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Mark arguments to certain syscalls as being constDavid Howells2010-08-132-2/+2
|/ | | | | | | | | | | | | | | | Mark arguments to certain system calls as being const where they should be but aren't. The list includes: (*) The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. (*) The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc: fix i8042 module build errorGrant Likely2010-08-061-0/+2
| | | | | | of_i8042_{kbd,aux}_irq needs to be exported Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* Merge branch 'timers-timekeeping-for-linus' of ↵Linus Torvalds2010-08-061-33/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um: Fix read_persistent_clock fallout kgdb: Do not access xtime directly powerpc: Clean up obsolete code relating to decrementer and timebase powerpc: Rework VDSO gettimeofday to prevent time going backwards clocksource: Add __clocksource_updatefreq_hz/khz methods x86: Convert common clocksources to use clocksource_register_hz/khz timekeeping: Make xtime and wall_to_monotonic static hrtimer: Cleanup direct access to wall_to_monotonic um: Convert to use read_persistent_clock timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset powerpc: Cleanup xtime usage powerpc: Simplify update_vsyscall time: Kill off CONFIG_GENERIC_TIME time: Implement timespec_add x86: Fix vtime/file timestamp inconsistencies Trivial conflicts in Documentation/feature-removal-schedule.txt Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as per Thomas' earlier merge commit 47916be4e28c ("Merge branch 'powerpc.cherry-picks' into timers/clocksource")
| * Merge branch 'powerpc.cherry-picks' into timers/clocksourceThomas Gleixner2010-07-285-345/+72
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/powerpc/kernel/time.c Reason: The powerpc next tree contains two commits which conflict with the timekeeping changes: 8fd63a9e powerpc: Rework VDSO gettimeofday to prevent time going backwards c1aa687d powerpc: Clean up obsolete code relating to decrementer and timebase John Stultz identified them and provided the conflict resolution. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * powerpc: Clean up obsolete code relating to decrementer and timebasePaul Mackerras2010-07-282-135/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| | * powerpc: Rework VDSO gettimeofday to prevent time going backwardsPaul Mackerras2010-07-284-237/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | timkeeping: Fix update_vsyscall to provide wall_to_monotonic offsetJohn Stultz2010-07-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | update_vsyscall() did not provide the wall_to_monotoinc offset, so arch specific implementations tend to reference wall_to_monotonic directly. This limits future cleanups in the timekeeping core, so this patch fixes the update_vsyscall interface to provide wall_to_monotonic, allowing wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | powerpc: Cleanup xtime usageJohn Stultz2010-07-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes powerpc's direct xtime usage, allowing for further generic timeekeping cleanups Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> LKML-Reference: <1279068988-21864-6-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | powerpc: Simplify update_vsyscallJohn Stultz2010-07-271-30/+25
| |/ | | | | | | | | | | | | | | | | | | | | | | Currently powerpc's update_vsyscall calls an inline update_gtod. However, both are straightforward, and there are no other users, so this patch merges update_gtod into update_vsyscall. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1279068988-21864-5-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2010-08-061-0/+11
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug sched: No need for bootmem special cases sched: Revert nohz_ratelimit() for now sched: Reduce update_group_power() calls sched: Update rq->clock for nohz balanced cpus sched: Fix spelling of sibling sched, cpuset: Drop __cpuexit from cpu hotplug callbacks sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() sched: thread_group_cputime: Simplify, document the "alive" check sched: Remove the obsolete exit_state/signal hacks sched: task_tick_rt: Remove the obsolete ->signal != NULL check sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless sched: Fix comments to make them DocBook happy sched: Fix fix_small_capacity powerpc: Exclude arch_sd_sibiling_asym_packing() on UP powerpc: Enable asymmetric SMT scheduling on POWER7 sched: Add asymmetric group packing option for sibling domain sched: Fix capacity calculations for SMT4 sched: Change nohz idle load balancing logic to push model ...
| * \ Merge branch 'linus' into sched/coreIngo Molnar2010-07-2123-143/+111
| |\ \ | | | | | | | | | | | | | | | | | | | | Merge reason: Move from the -rc3 to the almost-rc6 base. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Fix spelling of siblingMichael Neuling2010-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No logic changes, only spelling. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: linuxppc-dev@ozlabs.org Cc: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <15249.1277776921@neuling.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge commit 'v2.6.35-rc3' into sched/coreIngo Molnar2010-06-181-0/+1
| |\ \ \ | | | | | | | | | | | | | | | Merge reason: Update to the latest -rc.
| * | | | powerpc: Exclude arch_sd_sibiling_asym_packing() on UPPeter Zijlstra2010-06-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only SMP systems care about load-balance features, plus this saves some .text space on UP and also fixes the build. Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Neuling <mikey@neuling.org> LKML-Reference: <tip-76cbd8a8f8b0dddbff89a6708bd5bd13c0d21a00@git.kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | powerpc: Enable asymmetric SMT scheduling on POWER7Michael Neuling2010-06-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 & 1 SMT4 uses threads 0, 1, 2 & 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 & 3 idle and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2010-08-063-60/+36
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits) tracing/kprobes: unregister_trace_probe needs to be called under mutex perf: expose event__process function perf events: Fix mmap offset determination perf, powerpc: fsl_emb: Restore setting perf_sample_data.period perf, powerpc: Convert the FSL driver to use local64_t perf tools: Don't keep unreferenced maps when unmaps are detected perf session: Invalidate last_match when removing threads from rb_tree perf session: Free the ref_reloc_sym memory at the right place x86,mmiotrace: Add support for tracing STOS instruction perf, sched migration: Librarize task states and event headers helpers perf, sched migration: Librarize the GUI class perf, sched migration: Make the GUI class client agnostic perf, sched migration: Make it vertically scrollable perf, sched migration: Parameterize cpu height and spacing perf, sched migration: Fix key bindings perf, sched migration: Ignore unhandled task states perf, sched migration: Handle ignored migrate out events perf: New migration tool overview tracing: Drop cpparg() macro perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
| * | | | | perf, powerpc: fsl_emb: Restore setting perf_sample_data.periodScott Wood2010-08-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6b95ed345b9faa4ab3598a82991968f2e9f851bb changed from a struct initializer to perf_sample_data_init(), but the setting of the .period member was left out. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: stable@kernel.org Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | | | perf, powerpc: Convert the FSL driver to use local64_tPeter Zijlstra2010-08-031-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some reason the FSL driver got left out when we converted perf to use local64_t instead of atomic64_t. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | | | Merge commit 'v2.6.35' into perf/coreIngo Molnar2010-08-022-4/+4
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: tools/perf/Makefile tools/perf/util/hist.c Merge reason: Resolve the conflicts and update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ \ \ \ \ Merge branch 'linus' into perf/coreIngo Molnar2010-07-2118-79/+89
| |\ \ \ \ \ \ | | | |_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up the latest perf fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud