| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm:
vmstats: tlb flush counters") to cause overhead problems.
The counters are undeniably useful but how often do we really
need to debug TLB flush related issues? It does not justify
taking the penalty everywhere so make it a debugging option.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.
UP systems do not do remote TLB flushes, so compile those counters out on
UP.
arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4. It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
Remove the extra tab in __flush_tlb_one().
CC: Alex Shi <alex.shi@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
This function is called in __native_flush_tlb_global() and after
apply_microcode_early().
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
|
|
|
|
|
|
| |
All 486+ CPUs support INVLPG, so remove the fallback 386 support
code.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-6-git-send-email-hpa@linux.intel.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The incompatible parameter of flush_tlb_mm_range cause build warning.
Fix it by correct parameter.
Ingo Molnar found that this could also cause a user space crash.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch do flush_tlb_kernel_range by 'invlpg'. The performance pay
and gain was analyzed in previous patch
(x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range).
In the testing: http://lkml.org/lkml/2012/6/21/10
The pay is mostly covered by long kernel path, but the gain is still
quite clear, memory access in user APP can increase 30+% when kernel
execute this funtion.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-10-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not every tlb_flush execution moment is really need to evacuate all
TLB entries, like in munmap, just few 'invlpg' is better for whole
process performance, since it leaves most of TLB entries for later
accessing.
This patch also rewrite flush_tlb_range for 2 purposes:
1, split it out to get flush_blt_mm_range function.
2, clean up to reduce line breaking, thanks for Borislav's input.
My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59
show that the random memory access on other CPU has 0~50% speed up
on a 2P * 4cores * HT NHM EP while do 'munmap'.
Thanks Yongjie's testing on this patch:
-------------
I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest
kernel.
After running two benchmarks in Xen HVM guest, I found his patches
brought about 1%~3% performance gain in 'kernel build' and 'netperf'
testing, though the performance gain was not very stable in 'kernel
build' testing.
Some detailed testing results are below.
Testing Environment:
Hardware: Romley-EP platform
Xen version: latest upstream
Linux kernel: 3.4-RC6
Guest vCPU number: 8
NIC: Intel 82599 (10GB bandwidth)
In 'kernel build' testing in guest:
Command line | performance gain
make -j 4 | 3.81%
make -j 8 | 0.37%
make -j 16 | -0.52%
In 'netperf' testing, we tested TCP_STREAM with default socket size
16384 byte as large packet and 64 byte as small packet.
I used several clients to add networking pressure, then 'netperf' server
automatically generated several threads to response them.
I also used large-size packet and small-size packet in the testing.
Packet size | Thread number | performance gain
16384 bytes | 4 | 0.02%
16384 bytes | 8 | 2.21%
16384 bytes | 16 | 2.04%
64 bytes | 4 | 1.07%
64 bytes | 8 | 3.31%
64 bytes | 16 | 0.71%
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.com
Tested-by: Ren, Yongjie <yongjie.ren@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.
But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.
So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)
Here, X is 256, that is 1/2 of 512 entries.
But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.
As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.
My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.
Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':
multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199ns
Single process with sequencial flushing and memory accessing:
with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns
[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
"This tree includes a micro-optimization that avoids cr3 switches
during idling; it fixes corner cases and there's also small cleanups"
Fix up trivial context conflict with the percpu_xx -> this_cpu_xx
changes.
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64: Fix accounting in kernel_physical_mapping_init()
x86/tlb: Clean up and unify TLB_FLUSH_ALL definition
x86: Drop obsolete ARCH_BOOTMEM support
x86, tlb: Switch cr3 in leave_mm() only when needed
x86/mm: Fix the size calculation of mapping tables
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since sizeof(long) is 4 in x86_32 mode, and it's 8 in x86_64
mode, sizeof(long long) is also 8 byte in x86_64 mode.
use long mode can fit TLB_FLUSH_ALL defination here both in 32
or 64 bits mode.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/n/tip-evv5bekiipi2pmyzdsy8lkkw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.
On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.
Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.
Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|/
|
|
|
|
|
|
| |
Disintegrate asm/system.h for X86.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
cc: x86@kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs - we have
initial_page_table for that.
Signed-off-by: Borislav Petkov <bp@alien8.de>
LKML-Reference: <20101020070526.GA9588@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only one cpu is there, just call __flush_tlb for it. Fixes the following boot
warning on x86:
[ 0.000000] Memory: 885032k/915540k available (5993k kernel code, 29844k reserved, 3842k data, 428k init, 0k highmem)
[ 0.000000] virtual kernel memory layout:
[ 0.000000] fixmap : 0xffe17000 - 0xfffff000 (1952 kB)
[ 0.000000] vmalloc : 0xf8615000 - 0xffe15000 ( 120 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xf7e15000 ( 894 MB)
[ 0.000000] .init : 0xc19a5000 - 0xc1a10000 ( 428 kB)
[ 0.000000] .data : 0xc15da4bb - 0xc199af6c (3842 kB)
[ 0.000000] .text : 0xc1000000 - 0xc15da4bb (5993 kB)
[ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x50/0x1b0()
[ 0.000000] Hardware name: System Product Name
[ 0.000000] Modules linked in:
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip #52504
[ 0.000000] Call Trace:
[ 0.000000] [<c104aa16>] warn_slowpath_common+0x65/0x95
[ 0.000000] [<c104aa58>] warn_slowpath_null+0x12/0x15
[ 0.000000] [<c1073bbe>] smp_call_function_many+0x50/0x1b0
[ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41
[ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41
[ 0.000000] [<c1073d4f>] smp_call_function+0x31/0x58
[ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41
[ 0.000000] [<c104f635>] on_each_cpu+0x26/0x65
[ 0.000000] [<c10374b5>] flush_tlb_all+0x19/0x1b
[ 0.000000] [<c1032ab3>] zap_low_mappings+0x4d/0x56
[ 0.000000] [<c15d64b5>] ? printk+0x14/0x17
[ 0.000000] [<c19b42a8>] mem_init+0x23d/0x245
[ 0.000000] [<c19a56a1>] start_kernel+0x17a/0x2d5
[ 0.000000] [<c19a5347>] ? unknown_bootoption+0x0/0x19a
[ 0.000000] [<c19a5039>] __init_begin+0x39/0x41
[ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
|\
| |
| |
| |
| |
| |
| | |
Merge reason: tracing/core was on a .30-rc1 base and was missing out on
on a handful of tracing fixes present in .30-rc5-almost.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
does not agree with that specified by DEFINE_PER_CPU(). This means that
architectures that have a small data section references relative to a base
register may throw up linkage errors due to too great a displacement between
where the base register points and the per-CPU variable.
On FRV, the .h declaration says that the variable is in the .sdata section, but
the .c definition says it's actually in the .data section. The linker throws
up the following errors:
kernel/built-in.o: In function `release_task':
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
as does DEFINE_PER_CPU(). However, this is made slightly more complex by
virtue of the fact that there are several variants on DEFINE, so these need to
be matched by variants on DECLARE.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
currently these are paravirtulaized, doesn't appear any callers rely on
this (no pv_ops backends are using native_tlb and overriding cr3/4
access).
[ Impact: fix lockdep warning with paravirt and function tracer ]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <20090423172138.GR3036@sequoia.sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
| |
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|\ \ |
|
| |/
|/|
| |
| |
| |
| |
| | |
Impact: cleanup, moving NON-SMP stuff from smp.h
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Impact: reduce stack usage, use new cpumask API.
This is made a little more tricky by uv_flush_tlb_others which
actually alters its argument, for an IPI to be sent to the remaining
cpus in the mask.
I solve this by allocating a cpumask_var_t for this case and falling back
to IPI should this fail.
To eliminate temporaries in the caller, all flush_tlb_others implementations
now do the this-cpu-elimination step themselves.
Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
which has been there since pre-git and yet f->flush_cpumask is always zero
at this point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
|
|
|
|
|
|
|
|
|
| |
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since:
a. the double underscore is ugly and pointless.
b. no leading underscore violates namespace constraints.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|