| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Have a pointer to an allocated region inside struct kvm.
[alex: fix ppc book 3s]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Because we now emulate the DEC interrupt according to real life behavior,
there's no need to keep the AGGRESSIVE_DEC hack around.
Let's just remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We treated the DEC interrupt like an edge based one. This is not true for
Book3s. The DEC keeps firing until mtdec is issued again and thus clears
the interrupt line.
So let's implement this logic in KVM too. This patch moves the line clearing
from the firing of the interrupt to the mtdec emulation.
This makes PPC64 guests work without AGGRESSIVE_DEC defined.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're using a switch table to find the irqprio that belongs to a specific
interrupt vector. This table is part of the interrupt inject logic.
Since we'll add a new function to stop interrupts, let's move this table
out of the injection logic into a separate function.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
| |
s390 doesn't have mmio, this will simplify ifdefing it out.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tsc_offset adjustment in svm_vcpu_load is executed
unconditionally even if Linux considers the host tsc as
stable. This causes a Linux guest detecting an unstable tsc
in any case.
This patch removes the tsc_offset adjustment if the host tsc
is stable. The guest will now get the benefit of a stable
tsc too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
| |
Before enabling, execution of "rdtscp" in guest would result in #UD.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Sometime, we need to adjust some state in order to reflect guest CPUID
setting, e.g. if we don't expose rdtscp to guest, we won't want to enable
it on hardware. cpuid_update() is introduced for this purpose.
Also export kvm_find_cpuid_entry() for later use.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
KVM need vsyscall_init() to initialize MSR_TSC_AUX before it read the value.
Per Avi's suggestion, this patch raised vsyscall priority on hotplug notifier
chain, to 30.
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.
Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.
Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
| |
It's no longer necessary.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
ept_update_paging_mode_cr4() accesses vcpu->arch.cr4 directly, which usually
needs to be accessed via kvm_read_cr4(). In this case, we can't, since cr4
is in the process of being updated. Instead of adding inane comments, fold
the function into its caller (vmx_set_cr4), so it can use the not-yet-committed
cr4 directly.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
| |
We make no use of cr4.pge if ept is enabled, but the guest does (to flush
global mappings, as with vmap()), so give the guest ownership of this bit.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
| |
Instead of specifying the bits which we want to trap on, specify the bits
which we allow the guest to change transparently. This is safer wrt future
changes to cr4.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Some bits of cr4 can be owned by the guest on vmx, so when we read them,
we copy them to the vcpu structure. In preparation for making the set of
guest-owned bits dynamic, use helpers to access these bits so we don't need
to know where the bit resides.
No changes to svm since all bits are host-owned there.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
| |
They have no place in common code.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
We don't support these instructions, but guest can execute them even if the
feature('monitor') haven't been exposed in CPUID. So we would trap and inject
a #UD if guest try this way.
Cc: stable@kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
| |
In the past we've had errors of single-bit in the other two cases; the
printk() may confirm it for the third case (many->many).
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
| |
Windows 2003 uses task switch to triple fault and reboot (the other
exception being reserved pdptrs bits).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, uv: Remove recursion in uv_heartbeat_enable()
x86, uv: uv_global_gru_mmr_address() macro fix
x86, uv: Add serial number parameter to uv_bios_get_sn_info()
|
| |
| |
| |
| |
| |
| |
| |
| | |
The recursion is not needed and does not improve readability.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
LKML-Reference: <4B45F13E.3040202@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix bug in uv_global_gru_mmr_address macro. Macro failed
to cast an int value to a long prior to a left shift > 32.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100107161240.GA2610@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add system_serial_number to the information returned by
uv_bios_get_sn_info() UV BIOS call.
Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20091217165323.GA30774@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-ptrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, ptrace: Remove set_stopped_child_used_math() in [x]fpregs_set
x86, ptrace: Simplify xstateregs_get()
ptrace: Fix ptrace_regset() comments and diagnose errors specifically
parisc: Disable CONFIG_HAVE_ARCH_TRACEHOOK
ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET
x86, ptrace: regset extensions to support xstate
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
init_fpu() already ensures that the used_math() is set for the stopped child.
Remove the redundant set_stopped_child_used_math() in [x]fpregs_set()
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100222225240.642169080@sbs-t61.sc.intel.com>
Acked-by: Rolan McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
48 bytes (bytes 464..511) of the xstateregs payload come from the
kernel defined structure (xstate_fx_sw_bytes). Rest comes from the
xstate regs structure in the thread struct. Instead of having multiple
user_regset_copyout()'s, simplify the xstateregs_get() by first
copying the SW bytes into the xstate regs structure in the thread structure
and then using one user_regset_copyout() to copyout the xstateregs.
Requested-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100222225240.494688491@sbs-t61.sc.intel.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
> FYI, this commit broke tip:master on PARISC (other architectures are fine):
>
> kernel/built-in.o: In function `ptrace_request':
> (.text.ptrace_request+0x2cc): undefined reference to `task_user_regset_view'
This means that parisc failed to meet the documented requirements for
setting CONFIG_HAVE_ARCH_TRACEHOOK, but set it anyway. If arch folks don't
follow the specs, it defeats the whole purpose of having clear statements
of requirements for arch code.
Until parisc finishes up its requirements, disable CONFIG_HAVE_ARCH_TRACEHOOK.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20100222183707.8749D64C@magilla.sf.frob.com>
Cc: <linux-parisc@vger.kernel.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add the xstate regset support which helps extend the kernel ptrace and the
core-dump interfaces to support AVX state etc.
This regset interface is designed to support all the future state that gets
supported using xsave/xrstor infrastructure.
Looking at the memory layout saved by "xsave", one can't say which state
is represented in the memory layout. This is because if a particular state is
in init state, in the xsave hdr it can be represented by bit '0'. And hence
we can't really say by the xsave header wether a state is in init state or
the state is not saved in the memory layout.
And hence the xsave memory layout available through this regset
interface uses SW usable bytes [464..511] to convey what state is represented
in the memory layout.
First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
mask(which is same as the 64bit mask returned by the xgetbv's xCR0).
The note NT_X86_XSTATE represents the extended state information in the
core file, using the above mentioned memory layout.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com>
Signed-off-by: Hongjiu Lu <hjl.tools@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Enable NMI on all cpus on UV
vgaarb: Add user selectability of the number of GPUS in a system
vgaarb: Fix VGA arbiter to accept PCI domains other than 0
x86, uv: Update UV arch to target Legacy VGA I/O correctly.
pci: Update pci_set_vga_state() to call arch functions
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Enable NMI on all cpus in UV system and add an NMI handler
to dump_stack on each cpu.
By default on x86 all the cpus except the boot cpu have NMI
masked off. This patch enables NMI on all cpus in UV system
and adds an NMI handler to dump_stack on each cpu. This
way if a system hangs we can NMI the machine and get a
backtrace from all the cpus.
Version 2: Use x86_platform driver mechanism for nmi init, per
Ingo's suggestion.
Version 3: Clean up Ingo's nits.
Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20100226164912.GA24439@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add function to direct Legacy VGA I/O traffic to correct I/O Hub.
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McEbi018727@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: Don't skip mode setting for the standard VGA modes
x86-64, setup: Inhibit decompressor output if video info is invalid
x86, setup: When restoring the screen, update boot_params.screen_info
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The code for setting standard VGA modes probes for the current mode,
and skips the mode setting if the mode is 3 (color text 80x25) or 7
(mono text 80x25). Unfortunately, there are BIOSes, including the
VMware BIOS, which report the previous mode if function 0F is queried
while the screen is in a VESA mode, and of course, nothing can help a
mode poked directly into the hardware.
As such, the safe option is to set the mode anyway, and only query to
see if we should be using mode 7 rather than mode 3. People who don't
want any mode setting at all should probably use vga=0x0f04
(VIDEO_CURRENT_MODE). It's possible that should be the kernel
default.
Reported-by Rene Arends <R.R.Arends@hro.nl>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Inhibit output from the kernel decompressor if the video information
is invalid. This was already the case for 32 bits, make 64 bits
match.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When we restore the screen content after a mode change, we return the
cursor to its former position. However, we need to also update
boot_params.screen_info accordingly, so that the decompression code
knows where on the screen the cursor is. Just in case the video BIOS
does something extra screwy, read the cursor position back from the
BIOS instead of relying on it doing the right thing.
While we're at it, make sure we cap the cursor position to the new
screen coordinates.
Reported-by: Wim Osterholt <wim@djo.tudelft.nl>
Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15329
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write
x86-64, rwsem: 64-bit xadd rwsem implementation
x86: Fix breakage of UML from the changes in the rwsem system
x86-64: support native xadd rwsem implementation
x86: clean up rwsem type system
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The Intel Architecture Optimization Reference Manual states that a short
load that follows a long store to the same object will suffer a store
forwading penalty, particularly if the two accesses use different addresses.
Trivially, a long load that follows a short store will also suffer a penalty.
__downgrade_write() in rwsem incurs both penalties: the increment operation
will not be able to reuse a recently-loaded rwsem value, and its result will
not be reused by any recently-following rwsem operation.
A comment in the code states that this is because 64-bit immediates are
special and expensive; but while they are slightly special (only a single
instruction allows them), they aren't expensive: a test shows that two loops,
one loading a 32-bit immediate and one loading a 64-bit immediate, both take
1.5 cycles per iteration.
Fix this by changing __downgrade_write to use the same add instruction on
i386 and on x86_64, so that it uses the same operand size as all the other
rwsem functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
For x86-64, 32767 threads really is not enough. Change rwsem_count_t
to a signed long, so that it is 64 bits on x86-64.
This required the following changes to the assembly code:
a) %z0 doesn't work on all versions of gcc! At least gcc 4.4.2 as
shipped with Fedora 12 emits "ll" not "q" for 64 bits, even for
integer operands. Newer gccs apparently do this correctly, but
avoid this problem by using the _ASM_ macros instead of %z.
b) 64 bits immediates are only allowed in "movq $imm,%reg"
constructs... no others. Change some of the constraints to "e",
and fix the one case where we would have had to use an invalid
immediate -- in that case, we only care about the upper half
anyway, so just access the upper half.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <tip-bafaecd11df15ad5b1e598adc7736afcd38ee13d@git.kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The patches 5d0b7235d83eefdafda300656e97d368afcafc9a and
bafaecd11df15ad5b1e598adc7736afcd38ee13d broke the UML build:
On Sun, 17 Jan 2010, Ingo Molnar wrote:
>
> FYI, -tip testing found that these changes break the UML build:
>
> kernel/built-in.o: In function `__up_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:192: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__up_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:210: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__downgrade_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:228: undefined reference to `call_rwsem_downgrade_wake'
> kernel/built-in.o: In function `__down_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:112: undefined reference to `call_rwsem_down_read_failed'
> kernel/built-in.o: In function `__down_write_nested':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:154: undefined reference to `call_rwsem_down_write_failed'
> collect2: ld returned 1 exit status
Add lib/rwsem_64.o to the UML subarch objects to fix.
LKML-Reference: <alpine.LFD.2.00.1001171023440.13231@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This one is much faster than the spinlock based fallback rwsem code,
with certain artifical benchmarks having shown 300%+ improvement on
threaded page faults etc.
Again, note the 32767-thread limit here. So this really does need that
whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
extension on top of it, but that is conceptually a totally independent
issue.
NOT TESTED! The original patch that this all was based on were tested by
KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
cleaned-up series, so caveat emptor..
Also note that it _may_ be a good idea to mark some more registers
clobbered on x86-64 in the inline asms instead of saving/restoring them.
They are inline functions, but they are only used in places where there
are not a lot of live registers _anyway_, so doing for example the
clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
worse, and would make the slow-path code smaller.
(Not that the slow-path really matters to that degree. Saving a few
unnecessary registers is the _least_ of our problems when we hit the slow
path. The instruction/cycle counting really only matters in the fast
path).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The fast version of the rwsems (the code that uses xadd) has
traditionally only worked on x86-32, and as a result it mixes different
kinds of types wildly - they just all happen to be 32-bit. We have
"long", we have "__s32", and we have "int".
To make it work on x86-64, the types suddenly matter a lot more. It can
be either a 32-bit or 64-bit signed type, and both work (with the caveat
that a 32-bit counter will only have 15 bits of effective write
counters, so it's limited to 32767 users). But whatever type you
choose, it needs to be used consistently.
This makes a new 'rwsem_counter_t', that is a 32-bit signed type. For a
64-bit type, you'd need to also update the BIAS values.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121755220.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Remove configurable node size support for numa emulation
x86, numa: Add fixed node size option for numa emulation
x86, numa: Fix numa emulation calculation of big nodes
x86, acpi: Map hotadded cpu to correct node.
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Now that numa=fake=<size>[MG] is implemented, it is possible to remove
configurable node size support. The command-line parsing was already
broken (numa=fake=*128, for example, would not work) and since fake nodes
are now interleaved over physical nodes, this support is no longer
required.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
numa=fake=N specifies the number of fake nodes, N, to partition the
system into and then allocates them by interleaving over physical nodes.
This requires knowledge of the system capacity when attempting to
allocate nodes of a certain size: either very large nodes to benchmark
scalability of code that operates on individual nodes, or very small
nodes to find bugs in the VM.
This patch introduces numa=fake=<size>[MG] so it is possible to specify
the size of each node to allocate. When used, nodes of the size
specified will be allocated and interleaved over the set of physical
nodes.
FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
include/asm/numa_64.h.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
numa=fake=N uses split_nodes_interleave() to partition the system into N
fake nodes. Each node size must have be a multiple of
FAKE_NODE_MIN_SIZE, otherwise it is possible to get strange alignments.
Because of this, the remaining memory from each node when rounded to
FAKE_NODE_MIN_SIZE is consolidated into a number of "big nodes" that are
bigger than the rest.
The calculation of the number of big nodes is incorrect since it is using
a logical AND operator when it should be multiplying the rounded-off
portion of each node with N.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342230.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When hotadd new cpu to system, if its affinitive node is online,
should map the cpu to its own node. Otherwise, let kernel select one
online node for the new cpu later.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4B6AAA39.6000300@linux.intel.com>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Convert set_atomicity_lock to raw_spinlock
x86, mtrr: Kill over the top warn
x86, mtrr: Constify struct mtrr_ops
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Fixes bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12558
Fixes bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12317
(and if this really needed to be a warn you'd be responding to the bugs left
in bugzilla from it...)
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20100208100239.2568.2940.stgit@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|