summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/paravirt.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds2011-08-121-0/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip: x86-64: Rework vsyscall emulation and add vsyscall= parameter x86-64: Wire up getcpu syscall x86: Remove unnecessary compile flag tweaks for vsyscall code x86-64: Add vsyscall:emulate_vsyscall trace event x86-64: Add user_64bit_mode paravirt op x86-64, xen: Enable the vvar mapping x86-64: Work around gold bug 13023 x86-64: Move the "user" vsyscall segment out of the data segment. x86-64: Pad vDSO to a page boundary
| * x86-64: Add user_64bit_mode paravirt opAndy Lutomirski2011-08-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | KVM guest: Add a pv_ops stub for steal timeGlauber Costa2011-07-141-0/+9
|/ | | | | | | | | | | | | | | | This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* thp: add pmd paravirt opsAndrea Arcangeli2011-01-131-0/+3
| | | | | | | | | | | | | Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs pmd_update_defer), but this is to keep full simmetry with pte paravirt ops, which looks cleaner and simpler from a common code POV. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86, paravirt: Remove alloc_pmd_clone hook, only used by VMIAlok Kataria2010-08-231-1/+0
| | | | | | | | | | VMI was the only user of the alloc_pmd_clone hook, given that VMI is now removed we can also remove this hook. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* x86, paravirt: Remove kmap_atomic_pte paravirt op.Ian Campbell2010-02-271-4/+0
| | | | | | | | | | | | | | | Now that both Xen and VMI disable allocations of PTE pages from high memory this paravirt op serves no further purpose. This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping highpte pages". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com> Acked-by: Alok Kataria <akataria@vmware.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds2009-09-181-35/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits) x86: Move get/set_wallclock to x86_platform_ops x86: platform: Fix section annotations x86: apic namespace cleanup x86: Distangle ioapic and i8259 x86: Add Moorestown early detection x86: Add hardware_subarch ID for Moorestown x86: Add early platform detection x86: Move tsc_init to late_time_init x86: Move tsc_calibration to x86_init_ops x86: Replace the now identical time_32/64.c by time.c x86: time_32/64.c unify profile_pc x86: Move calibrate_cpu to tsc.c x86: Make timer setup and global variables the same in time_32/64.c x86: Remove mca bus ifdef from timer interrupt x86: Simplify timer_ack magic in time_32.c x86: Prepare unification of time_32/64.c x86: Remove do_timer hook x86: Add timer_init to x86_init_ops x86: Move percpu clockevents setup to x86_init_ops x86: Move xen_post_allocator_init into xen_pagetable_setup_done ... Fix up conflicts in arch/x86/include/asm/io_apic.h
| * x86: Move get/set_wallclock to x86_platform_opsFeng Tang2009-09-161-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | get/set_wallclock() have already a set of platform dependent implementations (default, EFI, paravirt). MRST will add another variant. Moving them to platform ops simplifies the existing code and minimizes the effort to integrate new variants. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Move tsc_calibration to x86_init_opsThomas Gleixner2009-08-311-1/+0
| | | | | | | | | | | | | | | | | | | | TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Add timer_init to x86_init_opsThomas Gleixner2009-08-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Move percpu clockevents setup to x86_init_opsThomas Gleixner2009-08-311-2/+0
| | | | | | | | | | | | | | | | | | | | paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Move paravirt pagetable_setup to x86_init_opsThomas Gleixner2009-08-311-7/+0
| | | | | | | | | | | | Replace more paravirt hackery by proper x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Move paravirt banner printout to x86_init_opsThomas Gleixner2009-08-311-9/+1
| | | | | | | | | | | | | | | | Replace another obscure paravirt magic and move it to x86_init_ops. Such a hook is also useful for embedded and special hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Replace ARCH_SETUP by a proper x86_init_opsThomas Gleixner2009-08-311-1/+0
| | | | | | | | | | | | | | ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Move irq_init to x86_init_opsThomas Gleixner2009-08-311-6/+0
| | | | | | | | | | | | | | | | | | irq_init is overridden by x86_quirks and by paravirts. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Move memory_setup to x86_init_opsThomas Gleixner2009-08-271-6/+0
| | | | | | | | | | | | | | | | | | memory_setup is overridden by x86_quirks and by paravirts with weak functions and quirks. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86, msr: Rewrite AMD rd/wrmsr variantsBorislav Petkov2009-08-311-1/+0
| | | | | | | | | | | | | | | | | | Switch them to native_{rd,wr}msr_safe_regs and remove pv_cpu_ops.read_msr_amd. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86, msr: Add rd/wrmsr interfaces with preset registersBorislav Petkov2009-08-311-0/+2
|/ | | | | | | | | | | | | native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow presetting of a subset of eight x86 GPRs before executing the rd/wrmsr instructions. This is needed at least on AMD K8 for accessing an erratum workaround MSR. Originally based on an idea by H. Peter Anvin. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* Merge branch 'x86-xen-for-linus' of ↵Linus Torvalds2009-06-101-30/+26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...
| * Merge commit 'origin/master' into for-linus/xen/masterJeremy Fitzhardinge2009-04-071-1/+0
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c
| * | x86/paravirt: use percpu_ rather than __get_cpu_varJeremy Fitzhardinge2009-03-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: minor optimisation percpu_read/write is a slightly more direct way of getting to percpu data. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | x86/paravirt: allow preemption with lazy mmu modeJeremy Fitzhardinge2009-03-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: remove obsolete checks, simplification Lift restrictions on preemption with lazy mmu mode, as it is now allowed. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | x86/paravirt: finish change from lazy cpu to context switch start/endJeremy Fitzhardinge2009-03-291-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | x86/paravirt: flush pending mmu updates on context switchJeremy Fitzhardinge2009-03-291-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switchJeremy Fitzhardinge2009-03-291-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | x86/paravirt: remove lazy mode in interruptsJeremy Fitzhardinge2009-03-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: simplification, robustness Make paravirt_lazy_mode() always return PARAVIRT_LAZY_NONE when in an interrupt. This prevents interrupt code from accidentally inheriting an outer lazy state, and instead does everything synchronously. Outer batched operations are left deferred. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de>
* | | x86: Fix performance regression caused by paravirt_ops on native kernelsJeremy Fitzhardinge2009-05-151-0/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: "Li Xin" <xin.li@intel.com> Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: with the last user gone, remove set_pte_presentJeremy Fitzhardinge2009-03-191-1/+0
|/ | | | | | | | | | | | | | | | Impact: cleanup set_pte_present() is no longer used, directly or indirectly, so remove it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> LKML-Reference: <1237406613-2929-2-git-send-email-jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: refactor x86_quirks supportIngo Molnar2009-02-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup Make x86_quirks support more transparent. The highlevel methods are now named: extern void x86_quirk_pre_intr_init(void); extern void x86_quirk_intr_init(void); extern void x86_quirk_trap_init(void); extern void x86_quirk_pre_time_init(void); extern void x86_quirk_time_init(void); This makes it clear that if some platform extension has to do something here that it is considered ... weird, and is discouraged. Also remove arch_hooks.h and move it into setup.h (and other header files where appropriate). Signed-off-by: Ingo Molnar <mingo@elte.hu>
*-. Merge branches 'x86/paravirt', 'x86/pat', 'x86/setup-v2', 'x86/subarch', ↵Ingo Molnar2009-02-131-0/+26
|\ \ | | | | | | | | | 'x86/uaccess' and 'x86/urgent' into x86/core
| | * x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible contextThomas Gleixner2009-02-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Catch cases where lazy MMU state is active in a preemtible context arch_flush_lazy_mmu_cpu() has been changed to disable preemption so the checks in enter/leave will never trigger. Put the preemtible() check into arch_flush_lazy_mmu_cpu() to catch such cases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemptionJeremy Fitzhardinge2009-02-121-0/+24
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: avoid access to percpu vars in preempible context They are intended to be used whenever there's the possibility that there's some stale state which is going to be overwritten with a queued update, or to force a state change when we may be in lazy mode. Either way, we could end up calling it with preemption enabled, so wrap the functions in their own little preempt-disable section so they can be safely called in any context (though preemption should never be enabled if we're actually in a lazy state). (Move out of line to avoid #include dependencies.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86/paravirt: use callee-saved convention for pte_val/make_pte/etcJeremy Fitzhardinge2009-01-301-41/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Optimization In the native case, pte_val, make_pte, etc are all just identity functions, so there's no need to clobber a lot of registers over them. (This changes the 32-bit callee-save calling convention to return both EAX and EDX so functions can return 64-bit values.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86/paravirt: add register-saving thunks to reduce caller register pressureJeremy Fitzhardinge2009-01-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Optimization One of the problems with inserting a pile of C calls where previously there were none is that the register pressure is greatly increased. The C calling convention says that the caller must expect a certain set of registers may be trashed by the callee, and that the callee can use those registers without restriction. This includes the function argument registers, and several others. This patch seeks to alleviate this pressure by introducing wrapper thunks that will do the register saving/restoring, so that the callsite doesn't need to worry about it, but the callee function can be conventional compiler-generated code. In many cases (particularly performance-sensitive cases) the callee will be in assembler anyway, and need not use the compiler's calling convention. Standard calling convention is: arguments return scratch x86-32 eax edx ecx eax ? x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11 The thunk preserves all argument and scratch registers. The return register is not preserved, and is available as a scratch register for unwrapped callee code (and of course the return value). Wrapped function pointers are themselves wrapped in a struct paravirt_callee_save structure, in order to get some warning from the compiler when functions with mismatched calling conventions are used. The most common paravirt ops, both statically and dynamically, are interrupt enable/disable/save/restore, so handle them first. This is particularly easy since their calls are handled specially anyway. XXX Deal with VMI. What's their calling convention? Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86/pvops: add a paravirt_ident functions to allow special patchingJeremy Fitzhardinge2009-01-301-9/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Optimization Several paravirt ops implementations simply return their arguments, the most obvious being the make_pte/pte_val class of operations on native. On 32-bit, the identity function is literally a no-op, as the calling convention uses the same registers for the first argument and return. On 64-bit, it can be implemented with a single "mov". This patch adds special identity functions for 32 and 64 bit argument, and machinery to recognize them and replace them with either nops or a mov as appropriate. At the moment, the only users for the identity functions are the pagetable entry conversion functions. The result is a measureable improvement on pagetable-heavy benchmarks (2-3%, reducing the pvops overhead from 5 to 2%). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86/pvops: remove pte_flags pvopJeremy Fitzhardinge2009-01-221-1/+0
|/ | | | | | | | | | | pte_flags() was introduced as a new pvop in order to extract just the flags portion of a pte, which is a potentially cheaper operation than extracting the page number as well. It turns out this operation is not needed, because simply using a mask to extract the flags from a pte is sufficient for all current users. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'linus' into x86/xenIngo Molnar2008-10-121-2/+1
|\ | | | | | | | | | | | | Conflicts: arch/x86/kernel/cpu/common.c arch/x86/kernel/process_64.c arch/x86/xen/enlighten.c
| * Merge branch 'x86/apic' into x86-v28-for-linus-phase4-BIngo Molnar2008-10-111-2/+0
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/kernel/apic_32.c arch/x86/kernel/apic_64.c arch/x86/kernel/setup.c drivers/pci/intel-iommu.c include/asm-x86/cpufeature.h include/asm-x86/dma-mapping.h
| | * Merge branch 'linus' into x86/x2apicIngo Molnar2008-07-251-1/+1
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/pci/dmar.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * \ Merge branch 'linus' into x86/x2apicIngo Molnar2008-07-221-0/+28
| | |\ \
| | * | | x86: let 32bit use apic_ops too - fixYinghai Lu2008-07-141-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix for pv - clean up the namespace there too. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | x64, x2apic/intr-remap: Interrupt-remapping and x2apic supportSuresh Siddha2008-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote: > > Btw., i threw it at the -tip test-cluster and got back a quick build > bugreport: > > arch/x86/xen/enlighten.c: In function 'xen_patch': > arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used > arch/x86/xen/enlighten.c: At top level: > arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token > arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token > arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive > #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if > > with this config: > > http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad fix the typo. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Siddha Cc: Suresh B" <suresh.b.siddha@intel.com> Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org> Cc: "arjan@linux.intel.com" <arjan@linux.intel.com> Cc: "andi@firstfloor.org" <andi@firstfloor.org> Cc: "ebiederm@xmission.com" <ebiederm@xmission.com> Cc: "jbarnes@virtuousgeek.org" <jbarnes@virtuousgeek.org> Cc: "steiner@sgi.com" <steiner@sgi.com> Cc: jeremy@goop.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | x64, x2apic/intr-remap: basic apic ops supportSuresh Siddha2008-07-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce basic apic operations which handle the apic programming. This will be used later to introduce another specific operations for x2apic. For the perfomance critial accesses like IPI's, EOI etc, we use the native operations as they are already referenced by different indirections like genapic, irq_chip etc. 64bit Paravirt ops can also define their apic operations accordingly. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: akpm@linux-foundation.org Cc: arjan@linux.intel.com Cc: andi@firstfloor.org Cc: ebiederm@xmission.com Cc: jbarnes@virtuousgeek.org Cc: steiner@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge commit 'v2.6.27-rc7' into x86/debugIngo Molnar2008-09-221-1/+1
| |\ \ \ \
| | * | | | x86: export pv_lock_ops non-GPLJeremy Fitzhardinge2008-08-211-1/+1
| | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | None of the spinlock API is exported GPL, so there's no reason for pv_lock_ops to be. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: drago01 <drago01@gmail.com>
| * | | | x86_64: printout msr -v2Yinghai Lu2008-08-221-0/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commandline show_msr=1 for bsp, show_msr=32 for all 32 cpus. [ mingo@elte.hu: added documentation ] Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'x86/spinlocks' into x86/xenIngo Molnar2008-07-311-23/+0
|\ \ \ \
| * | | | x86: split spinlock implementations out into their own filesJeremy Fitzhardinge2008-07-241-23/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftrace requires certain low-level code, like spinlocks and timestamps, to be compiled without -pg in order to avoid infinite recursion. This patch splits out the core paravirt spinlocks and the Xen spinlocks into separate files which can be compiled without -pg. Also do xen/time.c while we're about it. As a result, we can now use ftrace within a Xen domain. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86/paravirt/xen: properly fill out the ldt opsJeremy Fitzhardinge2008-07-241-0/+4
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | LTP testing showed that Xen does not properly implement sys_modify_ldt(). This patch does the final little bits needed to make the ldt work properly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | x86: fix pte_flags() to only return flags, fix lguest (updated)Rusty Russell2008-07-221-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Jeremy said: rusty: use PTE_MASK rusty: use PTE_MASK rusty: use PTE_MASK When I asked: jsgf: does that include the NX flag? He responded eloquently: rusty: use PTE_MASK rusty: use PTE_MASK yes, it's the official constant of masking flags out of ptes ) Change a15af1c9ea2750a9ff01e51615c45950bad8221b 'x86/paravirt: add pte_flags to just get pte flags' removed lguest's private pte_flags() in favor of a generic one. Unfortunately, the generic one doesn't filter out the non-flags bits: this results in lguest creating corrupt shadow page tables and blowing up host memory. Since noone is supposed to use the pfn part of pte_flags(), it seems safest to always do the filtering. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-and-morning-tea-spilled-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud