summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'stable/for-linus-3.6-rc1-tag' of ↵Linus Torvalds2012-08-161-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fix from Konrad Rzeszutek Wilk: "Way back in v3.5 we added a mechanism to populate back pages that were released (they overlapped with MMIO regions), but neglected to reserve the proper amount of virtual space for extend_brk to work properly. Coincidentally some other commit aligned the _brk space to larger area so I didn't trigger this until it was run on a machine with more than 2GB of MMIO space." * On machines with large MMIO/PCI E820 spaces we fail to boot b/c we failed to pre-allocate large enough virtual space for extend_brk. * tag 'stable/for-linus-3.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.
| * xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.Konrad Rzeszutek Wilk2012-08-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we release pages back during bootup: Freeing 9d-100 pfn range: 99 pages freed Freeing 9cf36-9d0d2 pfn range: 412 pages freed Freeing 9f6bd-9f6bf pfn range: 2 pages freed Freeing 9f714-9f7bf pfn range: 171 pages freed Freeing 9f7e0-9f7ff pfn range: 31 pages freed Freeing 9f800-100000 pfn range: 395264 pages freed Released 395979 pages of unused memory We then try to populate those pages back. In the P2M tree however the space for those leafs must be reserved - as such we use extend_brk. We reserve 8MB of _brk space, which means we can fit over 1048576 PFNs - which is more than we should ever need. Without this, on certain compilation of the kernel we would hit: (XEN) domain_crash_sync called from entry.S (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff818aad3b>] (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff81a7c000 rbx: 000000000000003d rcx: 0000000000001000 (XEN) rdx: ffffffff81a7b000 rsi: 0000000000001000 rdi: 0000000000001000 (XEN) rbp: ffffffff81801cd8 rsp: ffffffff81801c98 r8: 0000000000100000 (XEN) r9: ffffffff81a7a000 r10: 0000000000000001 r11: 0000000000000003 (XEN) r12: 0000000000000004 r13: 0000000000000004 r14: 000000000000003d (XEN) r15: 00000000000001e8 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000125803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801c98: .. which is extend_brk hitting a BUG_ON. Interestingly enough, most of the time we are not going to hit this b/c the _brk space is quite large (v3.5): ffffffff81a25000 B __brk_base ffffffff81e43000 B __brk_limit = ~4MB. vs earlier kernels (with this back-ported), the space is smaller: ffffffff81a25000 B __brk_base ffffffff81a7b000 B __brk_limit = 344 kBytes. where we would certainly hit this and hit extend_brk. Note that git commit c3d93f880197953f86ab90d9da4744e926b38e33 (xen: populate correct number of pages when across mem boundary (v2)) exposed this bug). [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion] CC: stable@vger.kernel.org #only for 3.5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | Merge branch 'release' of ↵Linus Torvalds2012-08-035-15/+14
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI and power management fixes from Len Brown: "A 3.3 sleep regression fixed, numa bugfix, plus some minor cleanups" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: ACPI processor: Fix tick_broadcast_mask online/offline regression ACPI: Only count valid srat memory structures ACPI: Untangle a return statement for better readability ACPI / PCI: Do not try to acquire _OSC control if that is hopeless ACPI: delete _GTS/_BFS support ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler' ACPI: replace strlen("string") with sizeof("string") -1 ACPI / PM: Fix build warning in sleep.c for CONFIG_ACPI_SLEEP unset
| | \
| | \
| *-. \ Merge branches 'delete-gts-bfs', 'misc', 'novell-bugzilla-757888-numa' and ↵Len Brown2012-08-03193-4040/+11247
| |\ \ \ | | | | | | | | | | | | | | | 'osc-pcie' into base
| | | * | ACPI: Only count valid srat memory structuresThomas Renninger2012-08-031-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise you could run into: WARN_ON in numa_register_memblks(), because node_possible_map is zero References: https://bugzilla.novell.com/show_bug.cgi?id=757888 On this machine (ProLiant ML570 G3) the SRAT table contains: - No processor affinities - One memory affinity structure (which is set disabled) CC: Per Jessen <per@opensuse.org> CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C ↵Len Brown2012-07-304-8/+6
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | function from assembler' cd74257b974d6d26442c97891c4d05772748b177 patched up GTS/BFS -- a feature we want to remove. So revert it (by hand, due to conflict in sleep.h) to prepare for GTS/BFS removal. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2012-08-033-7/+34
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM bug fixes from Marcelo Tosatti: - Fix DS/ES segment register corruption on x86_32. - Fix kvmclock wallclock migration offset. - Fix PIT interrupt ACK vs system reset logic bug. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Fix ds/es corruption on i386 with preemption KVM: x86: apply kvmclock offset to guest wall clock time KVM: PIC: call ack notifiers for irqs that are dropped form irr
| * | | | KVM: VMX: Fix ds/es corruption on i386 with preemptionAvi Kivity2012-08-011-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b2da15ac26a0c ("KVM: VMX: Optimize %ds, %es reload") broke i386 in the following scenario: vcpu_load ... vmx_save_host_state vmx_vcpu_run (ds.rpl, es.rpl cleared by hardware) interrupt push ds, es # pushes bad ds, es schedule vmx_vcpu_put vmx_load_host_state reload ds, es (with __USER_DS) pop ds, es # of other thread's stack iret # other thread runs interrupt push ds, es schedule # back in vcpu thread pop ds, es # now with rpl=0 iret ... vcpu_put resume_userspace iret # clears ds, es due to mismatched rpl (instead of resume_userspace, we might return with SYSEXIT and then take an exception; when the exception IRETs we end up with cleared ds, es) Fix by avoiding the optimization on i386 and reloading ds, es on the lightweight exit path. Reported-by: Chris Clayron <chris2553@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: apply kvmclock offset to guest wall clock timeBruce Rogers2012-08-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a guest migrates to a new host, the system time difference from the previous host is used in the updates to the kvmclock system time visible to the guest, resulting in a continuation of correct kvmclock based guest timekeeping. The wall clock component of the kvmclock provided time is currently not updated with this same time offset. Since the Linux guest caches the wall clock based time, this discrepency is not noticed until the guest is rebooted. After reboot the guest's time calculations are off. This patch adjusts the wall clock by the kvmclock_offset, resulting in correct guest time after a reboot. Cc: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Bruce Rogers <brogers@suse.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: PIC: call ack notifiers for irqs that are dropped form irrGleb Natapov2012-07-261-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 242ec97c358256 PIT interrupts are no longer delivered after PIC reset. It happens because PIT injects interrupt only if previous one was acked, but since on PIC reset it is dropped from irr it will never be delivered and hence acknowledged. Fix that by calling ack notifier on PIC reset. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2012-08-039-17/+58
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, kcmp: The kcmp system call can be common arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case x86/mce: Add quirk for instruction recovery on Sandy Bridge processors x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h> x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs x86, nops: Missing break resulting in incorrect selection on Intel x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental
| * | | | | x86-64, kcmp: The kcmp system call can be commonH. Peter Anvin2012-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already use the same system call handler for i386 and x86-64, there is absolutely no reason x32 can't use the same system call, too. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: <stable@vger.kernel.org> v3.5 Link: http://lkml.kernel.org/n/tip-vwzk3qbcr3yjyxjg2j38vgy9@git.kernel.org
| * | | | | arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error caseJulia Lawall2012-07-261-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Typically, the return value desired for the failure of a function with an integer return value is a negative integer. In these cases, the return value is sometimes a negative integer and sometimes 0, due to a subsequent initialization of the return variable within the loop. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) //<smpl> @r exists@ identifier ret; position p; constant C; expression e1,e3,e4; statement S; @@ ret = -C ... when != ret = e3 when any if@p (...) S ... when any if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; } ... when != ret = e3 when any *if@p (...) { ... when != ret = e4 return ret; } //</smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Link: http://lkml.kernel.org/r/1342284188-19176-7-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | x86/mce: Add quirk for instruction recovery on Sandy Bridge processorsTony Luck2012-07-261-3/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sandy Bridge processors follow the SDM (Vol 3B, Table 15-20) and set both the RIPV and EIPV bits in the MCG_STATUS register to zero for machine checks during instruction fetch. This is more than a little counter-intuitive and means that Linux cannot recover from these errors. Rather than insert special case code at several places in mce.c and mce-severity.c, we pretend the EIPV bit was set for just this case early in processing the machine check. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Link: http://lkml.kernel.org/r/180a06f3f357cf9f78259ae443a082b14a29535b.1343078495.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h>Tony Luck2012-07-262-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will need some of these values in mce.c. Move them to the appropriate header file so they are available. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqsTomoki Sekiyama2012-07-262-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current kernel, percpu variable `vector_irq' is not always cleared when a CPU is offlined. If the CPU that has the disabled irqs in vector_irq is hotplugged again, __setup_vector_irq() hits invalid irq vector and may crash. This bug can be reproduced as following; # echo 0 > /sys/devices/system/cpu/cpu7/online # modprobe -r some_driver_using_interrupts # vector_irq@cpu7 uncleared # echo 1 > /sys/devices/system/cpu/cpu7/online # kernel may crash To fix this problem, this patch clears vector_irq in __fixup_irqs() when the CPU is offlined. This also reverts commit f6175f5bfb4c, which partially fixes this bug by clearing vector in __clear_irq_vector(). But in environments with IOMMU IRQ remapper, it could fail because cfg->domain doesn't contain offlined CPUs. With this patch, the fix in __clear_irq_vector() can be reverted because every vector_irq is already cleared in __fixup_irqs() on offlined CPUs. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Yinghai Lu <yinghai@kernel.org> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120726104732.2889.19144.stgit@kvmdev Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | Merge branch 'linus' into x86/urgentIngo Molnar2012-07-2596-1519/+3848
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge in Linus's tree to avoid a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | x86, nops: Missing break resulting in incorrect selection on IntelAlan Cox2012-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Intel case falls through into the generic case which then changes the values. For cases like the P6 it doesn't do the right thing so this seems to be a screwup. Signed-off-by: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/n/tip-lww2uirad4skzjlmrm0vru8o@git.kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
| * | | | | | x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimentalJean Delvare2012-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This feature has been around for over 5 years now, and has no CONFIG_EXPERIMENTAL dependency anymore, so remove the '(EXPERIMENTAL)' tag from the help text as well. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/1341583705.4655.18.camel@amber.site Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2012-08-036-18/+115
|\ \ \ \ \ \ \ | |_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Fix merge window fallout and fix sleep profiling (this was always broken, so it's not a fix for the merge window - we can skip this one from the head of the tree)." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/trace: Add ability to set a target task for events perf/x86: Fix USER/KERNEL tagging of samples properly perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit
| * | | | | | perf/x86: Fix USER/KERNEL tagging of samples properlyPeter Zijlstra2012-07-315-17/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some PMUs don't provide a full register set for their sample, specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide 'better' than regular interrupt accuracy. In this case we use the interrupt regs as basis and over-write some fields (typically IP) with different information. The perf core however uses user_mode() to distinguish user/kernel samples, user_mode() relies on regs->cs. If the interrupt skid pushed us over a boundary the new IP might not be in the same domain as the interrupt. Commit ce5c1fe9a9e ("perf/x86: Fix USER/KERNEL tagging of samples") tried to fix this by making the perf core use kernel_ip(). This however is wrong (TM), as pointed out by Linus, since it doesn't allow for VM86 and non-zero based segments in IA32 mode. Therefore, provide a new helper to set the regs->ip field, set_linear_ip(), which massages the regs into a suitable state assuming the provided IP is in fact a linear address. Also modify perf_instruction_pointer() and perf_callchain_user() to deal with segments base offsets. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bitAndrew Morton2012-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i386 allmodconfig: arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_hrtimer': arch/x86/kernel/cpu/perf_event_intel_uncore.c:728: warning: integer overflow in expression arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_start_hrtimer': arch/x86/kernel/cpu/perf_event_intel_uncore.c:735: warning: integer overflow in expression Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-h84qlqj02zrojmxxybzmy9hi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpcLinus Torvalds2012-08-025-162/+65
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull OLPC platform updates from Andres Salomon: "These move the OLPC Embedded Controller driver out of arch/x86/platform and into drivers/platform/olpc. OLPC machines are now ARM-based (which means lots of x86 and ARM changes), but are typically pretty self-contained.. so it makes more sense to go through a separate OLPC tree after getting the appropriate review/ACKs." * 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc: x86: OLPC: move s/r-related EC cmds to EC driver Platform: OLPC: move global variables into priv struct Platform: OLPC: move debugfs support from x86 EC driver x86: OLPC: switch over to using new EC driver on x86 Platform: OLPC: add a suspended flag to the EC driver Platform: OLPC: turn EC driver into a platform_driver Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it drivers: OLPC: update various drivers to include olpc-ec.h Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver
| * | | | | | | x86: OLPC: move s/r-related EC cmds to EC driverAndres Salomon2012-07-312-22/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new EC driver calls platform-specific suspend and resume hooks; run XO-1-specific EC commands from there, rather than deep in s/r code. If we attempt to run EC commands after the new EC driver has suspended, it is refused by the ec->suspended checks. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | Platform: OLPC: move debugfs support from x86 EC driverAndres Salomon2012-07-311-97/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's nothing about the debugfs interface for the EC driver that is architecture-specific, so move it into the arch-independent driver. The code is mostly unchanged with the exception of renamed variables, coding style changes, and API updates. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | x86: OLPC: switch over to using new EC driver on x86Andres Salomon2012-07-312-31/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses the new EC driver framework in drivers/platform/olpc. The XO-1 and XO-1.5-specific code is still in arch/x86, but the generic stuff (including a new workqueue; no more running EC commands with IRQs disabled!) can be shared with other architectures. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | drivers: OLPC: update various drivers to include olpc-ec.hAndres Salomon2012-07-315-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch over to using olpc-ec.h in multiple steps, so as not to break builds. This covers every driver that calls olpc_ec_cmd(). Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driverAndres Salomon2012-07-312-18/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OLPC EC driver has outgrown arch/x86/platform/. It's time to both share common code amongst different architectures, as well as move it out of arch/x86/. The XO-1.75 is ARM-based, and the EC driver shares a lot of code with the x86 code. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Paul Fox <pgf@laptop.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | | Merge branch 'for-linus-3.6-rc1' of ↵Linus Torvalds2012-08-011-3/+3
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML fixes from Richard Weinberger: "This patch set contains mostly fixes and cleanups. The UML tty driver uses now tty_port and is no longer broken like hell :-)" * 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Add arch/x86/um to MAINTAINERS um: pass siginfo to guest process um: fix ubd_file_size for read-only files um: pull interrupt_end() into userspace() um: split syscall_trace(), pass pt_regs to it um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs um: set BLK_CGROUP=y in defconfig um: remove count_lock um: fully use tty_port um: Remove dead code um: remove line_ioctl() TTY: um/line, use tty from tty_port TTY: um/line, add tty_port
| * | | | | | | | um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regsAl Viro2012-08-011-3/+3
| | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | | | | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2012-07-315-146/+1473
|\ \ \ \ \ \ \ \ | |_|/ / / / / / |/| | / / / / / | | |/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes updates/cleanups/fixes from Oleg and diverse tooling updates (mostly fixes) now that Arnaldo is back from vacation." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) uprobes: __replace_page() needs munlock_vma_page() uprobes: Rename vma_address() and make it return "unsigned long" uprobes: Fix register_for_each_vma()->vma_address() check uprobes: Introduce vaddr_to_offset(vma, vaddr) uprobes: Teach build_probe_list() to consider the range uprobes: Remove insert_vm_struct()->uprobe_mmap() uprobes: Remove copy_vma()->uprobe_mmap() uprobes: Fix overflow in vma_address()/find_active_uprobe() uprobes: Suppress uprobe_munmap() from mmput() uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe() uprobes: Clean up and document write_opcode()->lock_page(old_page) uprobes: Kill write_opcode()->lock_page(new_page) uprobes: __replace_page() should not use page_address_in_vma() uprobes: Don't recheck vma/f_mapping in write_opcode() perf/x86: Fix missing struct before structure name perf/x86: Fix format definition of SNB-EP uncore QPI box perf/x86: Make bitfield unsigned perf/x86: Fix LLC-* and node-* events on Intel SandyBridge perf/x86: Add Intel Nehalem-EX uncore support perf/x86: Fix typo in format definition of uncore PCU filter ...
| * | | | | | perf/x86: Fix missing struct before structure nameJovi Zhang2012-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_PERF_EVENTS disabled, there will have a compiliation error, because missing struct before structure name. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/CACV3sbKF%3DCX%2B2jWEWesfCA6rBoQ3wDM4-5ac9MuBtVbCtMRHdQ@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf/x86: Fix format definition of SNB-EP uncore QPI boxYan, Zheng2012-07-262-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The event control register of SNB-EP uncore QPI box has a one bit extension at bit position 21. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1343097850-4348-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf/x86: Make bitfield unsignedPeter Zijlstra2012-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix: arch/x86/kernel/cpu/perf_event.h:377:43: sparse: dubious one-bit signed bitfield Cc: Borislav Petkov <bp@amd64.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-2jxkmktkppkclj1qe6qxd7ah@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf/x86: Fix LLC-* and node-* events on Intel SandyBridgeYan, Zheng2012-07-261-6/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLC-* and node-* events require using the OFFCORE_RESPONSE events on SandyBridge, but the hw_cache_extra_regs is left uninitialized. This patch adds the missing extra register configure table for SandyBridge. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1342517275-2875-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf/x86: Add Intel Nehalem-EX uncore supportYan, Zheng2012-07-262-129/+1352
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uncore subsystem in Nehalem-EX consists of 7 components (U-Box, C-Box, B-Box, S-Box, R-Box, M-Box and W-Box). This patch is large because the way to program these boxes is diverse. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FF534F1.3030307@intel.com [ Improved the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf/x86: Fix typo in format definition of uncore PCU filterYan, Zheng2012-07-261-8/+8
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The format definition of uncore PCU filter should be filter_band* instead of filter_brand*. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1343024611-4692-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSIONWill Deacon2012-07-302-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | firmware_map: make firmware_map_add_early() argument consistent with ↵Yasuaki Ishimatsu2012-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | firmware_map_add_hotplug() There are two ways to create /sys/firmware/memmap/X sysfs: - firmware_map_add_early When the system starts, it is calledd from e820_reserve_resources() - firmware_map_add_hotplug When the memory is hot plugged, it is called from add_memory() But these functions are called without unifying value of end argument as below: - end argument of firmware_map_add_early() : start + size - 1 - end argument of firmware_map_add_hogplug() : start + size The patch unifies them to "start + size". Even if applying the patch, /sys/firmware/memmap/X/end file content does not change. [akpm@linux-foundation.org: clarify comments] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | atomic64_test: simplify the #ifdef for atomic64_dec_if_positive() testCatalin Marinas2012-07-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE and use this instead of the multitude of #if defined() checks in atomic64_test.c Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2012-07-2623-359/+541
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros
| * | | | | | x86/tlb: Fix build warning and crash when building for !SMPAlex Shi2012-07-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The incompatible parameter of flush_tlb_mm_range cause build warning. Fix it by correct parameter. Ingo Molnar found that this could also cause a user space crash. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb: do flush_tlb_kernel_range by 'invlpg'Alex Shi2012-06-272-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch do flush_tlb_kernel_range by 'invlpg'. The performance pay and gain was analyzed in previous patch (x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range). In the testing: http://lkml.org/lkml/2012/6/21/10 The pay is mostly covered by long kernel path, but the gain is still quite clear, memory access in user APP can increase 30+% when kernel execute this funtion. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-10-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTORAlex Shi2012-06-275-306/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb: enable tlb flush range support for x86Alex Shi2012-06-273-70/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not every tlb_flush execution moment is really need to evacuate all TLB entries, like in munmap, just few 'invlpg' is better for whole process performance, since it leaves most of TLB entries for later accessing. This patch also rewrite flush_tlb_range for 2 purposes: 1, split it out to get flush_blt_mm_range function. 2, clean up to reduce line breaking, thanks for Borislav's input. My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59 show that the random memory access on other CPU has 0~50% speed up on a 2P * 4cores * HT NHM EP while do 'munmap'. Thanks Yongjie's testing on this patch: ------------- I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest kernel. After running two benchmarks in Xen HVM guest, I found his patches brought about 1%~3% performance gain in 'kernel build' and 'netperf' testing, though the performance gain was not very stable in 'kernel build' testing. Some detailed testing results are below. Testing Environment: Hardware: Romley-EP platform Xen version: latest upstream Linux kernel: 3.4-RC6 Guest vCPU number: 8 NIC: Intel 82599 (10GB bandwidth) In 'kernel build' testing in guest: Command line | performance gain make -j 4 | 3.81% make -j 8 | 0.37% make -j 16 | -0.52% In 'netperf' testing, we tested TCP_STREAM with default socket size 16384 byte as large packet and 64 byte as small packet. I used several clients to add networking pressure, then 'netperf' server automatically generated several threads to response them. I also used large-size packet and small-size packet in the testing. Packet size | Thread number | performance gain 16384 bytes | 4 | 0.02% 16384 bytes | 8 | 2.21% 16384 bytes | 16 | 2.04% 64 bytes | 4 | 1.07% 64 bytes | 8 | 3.31% 64 bytes | 16 | 0.71% Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.com Tested-by: Ren, Yongjie <yongjie.ren@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb: add tlb_flushall_shift knob into debugfsAlex Shi2012-06-272-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel will replace cr3 rewrite with invlpg when tlb_flush_entries <= active_tlb_entries / 2^tlb_flushall_factor if tlb_flushall_factor is -1, kernel won't do this replacement. User can modify its value according to specific CPU/applications. Thanks for Borislav providing the help message of CONFIG_DEBUG_TLBFLUSH. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-6-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb: add tlb_flushall_shift for specific CPUAlex Shi2012-06-274-6/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing show different CPU type(micro architectures and NUMA mode) has different balance points between the TLB flush all and multiple invlpg. And there also has cases the tlb flush change has no any help. This patch give a interface to let x86 vendor developers have a chance to set different shift for different CPU type. like some machine in my hands, balance points is 16 entries on Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing help. For untested machine, do a conservative optimization, same as NHM CPU. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb: fall back to flush all when meet a THP large pageAlex Shi2012-06-271-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to flush large pages by PAGE_SIZE step, that just waste time. and actually, large page don't need 'invlpg' optimizing according to our micro benchmark. So, just flush whole TLB is enough for them. The following result is tested on a 2CPU * 4cores * 2HT NHM EP machine, with THP 'always' setting. Multi-thread testing, '-t' paramter is thread number: without this patch with this patch ./mprotect -t 1 14ns 13ns ./mprotect -t 2 13ns 13ns ./mprotect -t 4 12ns 11ns ./mprotect -t 8 14ns 10ns ./mprotect -t 16 28ns 28ns ./mprotect -t 32 54ns 52ns ./mprotect -t 128 200ns 200ns Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-4-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_rangeAlex Shi2012-06-277-44/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 has no flush_tlb_range support in instruction level. Currently the flush_tlb_range just implemented by flushing all page table. That is not the best solution for all scenarios. In fact, if we just use 'invlpg' to flush few lines from TLB, we can get the performance gain from later remain TLB lines accessing. But the 'invlpg' instruction costs much of time. Its execution time can compete with cr3 rewriting, and even a bit more on SNB CPU. So, on a 512 4KB TLB entries CPU, the balance points is at: (512 - X) * 100ns(assumed TLB refill cost) = X(TLB flush entries) * 100ns(assumed invlpg cost) Here, X is 256, that is 1/2 of 512 entries. But with the mysterious CPU pre-fetcher and page miss handler Unit, the assumed TLB refill cost is far lower then 100ns in sequential access. And 2 HT siblings in one core makes the memory access more faster if they are accessing the same memory. So, in the patch, I just do the change when the target entries is less than 1/16 of whole active tlb entries. Actually, I have no data support for the percentage '1/16', so any suggestions are welcomed. As to hugetlb, guess due to smaller page table, and smaller active TLB entries, I didn't see benefit via my benchmark, so no optimizing now. My micro benchmark show in ideal scenarios, the performance improves 70 percent in reading. And in worst scenario, the reading/writing performance is similar with unpatched 3.4-rc4 kernel. Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP 'always': multi thread testing, '-t' paramter is thread number: with patch unpatched 3.4-rc4 ./mprotect -t 1 14ns 24ns ./mprotect -t 2 13ns 22ns ./mprotect -t 4 12ns 19ns ./mprotect -t 8 14ns 16ns ./mprotect -t 16 28ns 26ns ./mprotect -t 32 54ns 51ns ./mprotect -t 128 200ns 199ns Single process with sequencial flushing and memory accessing: with patch unpatched 3.4-rc4 ./mprotect 7ns 11ns ./mprotect -p 4096 -l 8 -n 10240 21ns 21ns [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com has additional performance numbers. ] Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | | | x86/tlb_info: get last level TLB entry number of CPUAlex Shi2012-06-274-0/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and instruction TLB, second level is shared TLB for both data and instructions. For hupe page TLB, usually there is just one level and seperated by 2MB/4MB and 1GB. Although each levels TLB size is important for performance tuning, but for genernal and rude optimizing, last level TLB entry number is suitable. And in fact, last level TLB always has the biggest entry number. This patch will get the biggest TLB entry number and use it in furture TLB optimizing. Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other function and data will be released after system boot up. For all kinds of x86 vendor friendly, vendor specific code was moved to its specific files. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
OpenPOWER on IntegriCloud