| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memblock is now fully integrated into the kernel and is the prefered
method for tracking memory. Rather than reinvent the wheel with
meminfo, migrate to using memblock directly instead of meminfo as
an intermediate.
Acked-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to a design incompatibility between the PCIe Marvell controller
and the Cortex-A9, stressing PCIe devices with a lot of traffic
quickly causes a deadlock.
One part of the workaround for this is to have all PCIe regions mapped
as strongly-ordered (MT_UNCACHED) instead of the default
MT_DEVICE. While the arch_ioremap_caller() mechanism allows
sub-architecture code to override ioremap(), used to map PCIe memory
regions, there isn't such a mechanism to override the behavior of
pci_ioremap_io().
This commit adds the arch_pci_ioremap_mem_type variable, initialized
to MT_DEVICE by default, and that sub-architecture code can
override. We have chosen to expose a single variable rather than
offering the possibility of overriding the entire pci_ioremap_io(),
because implementing pci_ioremap_io() requires calling functions
(get_mem_type()) that are private to the arch/arm/mm/ code.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
| |
A small mistake slipped through in memory.txt.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When configure kprobe events of ftrace with "stacktrace" option enabled
in arm, there is no stacktrace was recorded after the kprobe event was
triggered. The root cause is no save_stack_trace_regs() function implemented.
Implement the save_stack_trace_regs() function in arm, then ftrace will
call this architecture-related function to record the stacktrace into
ring buffer.
After this fix, stacktrace can be recorded, for example:
# mount -t debugfs nodev /sys/kernel/debug
# echo "p:netrx net_rx_action" >> /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/netrx/enable
# echo 1 > /sys/kernel/debug/tracing/options/stacktrace
# echo 1 > /sys/kernel/debug/tracing/tracing_on
# ping 127.0.0.1 -c 1
# echo 0 > /sys/kernel/debug/tracing/tracing_on
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 12/12 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<------ missing some entries ---------------->
ping-1200 [000] dNs1 667.603250: netrx: (net_rx_action+0x0/0x1f8)
ping-1200 [000] dNs1 667.604738: <stack trace>
=> net_rx_action
=> do_softirq
=> local_bh_enable
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> ip_push_pending_frames
=> raw_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> SyS_sendto
=> ret_fast_syscall
Signed-off-by: Lin Yongting <linyongting@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
| |
Support for ARM710 CPUs was removed in v3.5. Now remove the last code
depending on its Kconfig macro.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will reach fixup handler when one thread(say cpu0) caused an undefined exception, while another thread(say cpu1) is unmmaping the page.
Fixup handler returns to the next userspace instruction which has caused the undef execption, rather than going to the same instruction.
ARM ARM says that after undefined exception, the PC will be pointing
to the next instruction. ie +4 offset in case of ARM and +2 in case of Thumb
And there is no correction offset passed to vector_stub in case of
undef exception.
File: arch/arm/kernel/entry-armv.S +1085
vector_stub und, UND_MODE
During an undefined exception, in normal scenario(ie when ldrt
instruction does not cause an abort) after resorting the context in
VFP hardware, the PC is modified as show below before jumping to
ret_from_exception which is in r9.
File: arch/arm/vfp/vfphw.S +169
@ The context stored in the VFP hardware is up to date with this thread
vfp_hw_state_valid:
tst r1, #FPEXC_EX
bne process_exception @ might as well handle the pending
@ exception before retrying branch
@ out before setting an FPEXC that
@ stops us reading stuff
VFPFMXR FPEXC, r1 @ Restore FPEXC last
sub r2, r2, #4 @ Retry current instruction - if Thumb
str r2, [sp, #S_PC] @ mode it's two 16-bit instructions,
@ else it's one 32-bit instruction, so
@ always subtract 4 from the following
@ instruction address.
But if ldrt results in an abort, we reach the fixup handler and return
to ret_from_execption without correcting the pc.
This patch modifes the fixup handler to re-execute the same instruction which caused undefined execption.
Signed-off-by: Vinayak Menon <vinayakm.list@gmail.com>
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
| |
asm-generic offers an atomic-add based rwsem implementation, which
can avoid the need for heavier, spinlock-based synchronisation on the
fast path.
This patch makes use of the optimised implementation for ARM CPUs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
| |
The Cortex-A17 PMU is identical to that of the A12, so wire up a new
compatible string to the existing event structures.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On CPUs with virtualization extensions the kernel installs HYP mode
configuration on both primary and secondary cpus upon cold boot.
On platforms where CPUs are shutdown in idle paths (ie CPU core gating),
when a CPU resumes from low-power states it currently does not execute
code that reinstalls the HYP configuration, which means that the kernel
cannot run eg KVM properly on such machines.
This patch, mirroring cold-boot behaviour, executes position independent
code that reinstalls HYP configuration and drops to SVC mode safely on
warmboot, so that deep idle states can be enabled in kernel running as
hosts on platforms with power management HW.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After instruction write into xol area, on ARM V7
architecture code need to flush dcache and icache to sync
them up for given set of addresses. Having just
'flush_dcache_page(page)' call is not enough - it is
possible to have stale instruction sitting in icache
for given xol area slot address.
Introduce arch_uprobe_ixol_copy weak function
that by default calls uprobes copy_to_page function and
than flush_dcache_page function and on ARM define new one
that handles xol slot copy in ARM specific way
flush_uprobe_xol_access function shares/reuses implementation
with/of flush_ptrace_access function and takes care of writing
instruction to user land address space on given variety of
different cache types on ARM CPUs. Because
flush_uprobe_xol_access does not have vma around
flush_ptrace_access was split into two parts. First that
retrieves set of condition from vma and common that receives
those conditions as flags.
Note ARM cache flush function need kernel address
through which instruction write happened, so instead
of using uprobes copy_to_page function changed
code to explicitly map page and do memcpy.
Note arch_uprobe_copy_ixol function, in similar way as
copy_to_user_page function, has preempt_disable/preempt_enable.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The name "power_down_finish" seems to be causing some confusion,
because it suggests that this function is responsible for taking
some action to cause the specified CPU to complete its power down.
This patch renames the affected functions to "wait_for_powerdown"
and similar, since this function's intended purpose is just to wait
for the hardware to finish a powerdown initiated by a previous
cpu_power_down.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
| |
dsb st can be used to ensure completion of pending cache maintenance
operations, so use it for the v7 cache maintenance operations.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
| |
Cortex-A17 has identical initialisation requirements to Cortex-A12, so
hook it up in proc-v7.S in the same way.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
| |
With large kernel builds such as allyesconfig exceeding maximum relative
branch offsets, the init section will be too far away to branch to
directly. This causes veneers to be added by the linker, but veneers
don't work before the MMU is enabled. Fix this by moving __fixup_smp to
the .head.text section as it is not very big.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we unwind through an exception stack, include the saved PC value
into the stack trace: this fills in an otherwise missed functions from
the trace (as indicated below):
[<c03f4424>] fec_enet_interrupt+0xa0/0xe8
[<c0066c0c>] handle_irq_event_percpu+0x68/0x228
[<c0066e18>] handle_irq_event+0x4c/0x6c
[<c006a024>] handle_fasteoi_irq+0xac/0x198
[<c00664b0>] generic_handle_irq+0x4c/0x60
[<c000f014>] handle_IRQ+0x40/0x98
[<c0008554>] gic_handle_irq+0x30/0x64
[<c0012900>] __irq_svc+0x40/0x50
[<c0029030>] __do_softirq+0xe0/0x2fc <====
[<c0029500>] irq_exit+0xb0/0x100
[<c000f018>] handle_IRQ+0x44/0x98
[<c0008554>] gic_handle_irq+0x30/0x64
[<c0012900>] __irq_svc+0x40/0x50
[<c000f34c>] arch_cpu_idle+0x30/0x38 <====
[<c005e1e4>] cpu_startup_entry+0xac/0x214
[<c066297c>] rest_init+0x68/0x80
[<c08ccb10>] start_kernel+0x2fc/0x358
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While debugging the FEC ethernet driver using stacktrace, it was noticed
that the stacktraces always begin as follows:
[<c00117b4>] save_stack_trace_tsk+0x0/0x98
[<c0011870>] save_stack_trace+0x24/0x28
...
This is because the stack trace code includes the stack frames for itself.
This is incorrect behaviour, and also leads to "skip" doing the wrong
thing (which is the number of stack frames to avoid recording.)
Perversely, it does the right thing when passed a non-current thread. Fix
this by ensuring that we have a known constant number of frames above the
main stack trace function, and always skip these.
Cc: <stable@vger.kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid calling dma_cache_maint_page() when unmapping a DMA_TO_DEVICE
buffer. The L1 cache ops never do anything in this circumstance, nor
do they ever need to - all that matters for this case is that the data
written is visible to the device before DMA starts. What happens during
the transfer (provided the buffer is not written to) is of no real
consequence.
We already do this optimisation for the L2 cache.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:
Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0
page flags: 0x40000400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14)
[<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104)
[<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c)
[<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0)
[<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8)
[<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120)
[<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354)
[<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90)
[<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40)
The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);
//debug code
pfn = pte_pfn(entry);
pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry));
//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry));
...
}
pfn: 0x1fa4f5, pte:0xc00001fa4f575f
new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f //new pfn/pte is wrong.
The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.
Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.
This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.
CC stable <stable@vger.kernel.org>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Undef abort handler in the kernel reads the undefined instruction
from user space. If the page table was modified from another CPU, the
user access could fail and do_page_fault() will be executed with
interrupts disabled. This can potentially deadlock on ARM11MPCore or on
Cortex-A15 with erratum 798181 workaround enabled (both implying IPI for
TLB maintenance with page table lock held).
This patch enables the IRQs in __und_usr before attempting to read the
instruction from user space.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Arun KS <getarunks@gmail.com>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
| |
This patch is in preparation for calling the crunch_task_enable()
function with interrupts enabled.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
| |
This patch is in preparation for calling the iwmmxt_task_enable()
function with interrupts enabled.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 32-bit ARM systems, the fixmap mapping region can support no more
than 14 CPUs(total: 896k; one CPU: 64K). And we can configure NR_CPUS
up to 32. So there is a mismatch.
This patch moves fixmapping region downwards to region 0xffc00000-
0xffe00000. Then the fixmap mapping region can support up to 32 CPUs.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems that these two macros are not used by non architecture
specific code. And on ARM FIX_KMAP_BEGIN equals zero.
This patch removes these two macros. Instead, using FIX_KMAP_NR_PTES to
tell the pte number belonged to fixmap mapping region. The code will
become clearer when I introduce a bugfix on fixmap mapping region.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
| |
PJ4B needs extra instructions for suspend and resume, so instead of
using the armv7 version, this commit introduces specific versions for
PJ4B.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like the static mapping area for DMA was replaced by dynamic
allocation into the vmalloc area by commit e9da6e9905e6 but the
information in Documentation/arm/memory.txt was not removed accordingly.
CONSISTENT_END in arch/arm/include/asm/memory.h has no more users and
can be removed as well.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
| |
Make ftrace work with CONFIG_DEBUG_SET_MODULE_RONX by making module text
writable around the place where ftrace does its work, like it is done on
x86 in the patch which introduced CONFIG_DEBUG_SET_MODULE_RONX,
84e1c6bb38eb ("x86: Add RO/NX protection for loadable kernel modules").
Tested-by: Mitchel Humpherys <mitchelh@codeaurora.org>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable hibernation for ARM architectures and provide ARM
architecture specific calls used during hibernation.
The swsusp hibernation framework depends on the
platform first having functional suspend/resume.
Then, in order to enable hibernation on a given platform, a
platform_hibernation_ops structure may need to be registered with
the system in order to save/restore any SoC-specific / cpu specific
state needing (re)init over a suspend-to-disk/resume-from-disk cycle.
For example:
- "secure" SoCs that have different sets of control registers
and/or different CR reg access patterns.
- SoCs with L2 caches as the activation sequence there is
SoC-dependent; a full off-on cycle for L2 is not done
by the hibernation support code.
- SoCs requiring steps on wakeup _before_ the "generic" parts
done by cpu_suspend / cpu_resume can work correctly.
- SoCs having persistent state which is maintained during suspend
and resume, but will be lost during the power off cycle after
suspend-to-disk.
This is a rebase/rework of Frank Hofmann's v5 hibernation patchset.
Acked-by: Russ Dill <Russ.Dill@ti.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
[fixed duplicate virt_to_pfn() definition --rmk]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
| |
Use kcalloc() and ULONG_MAX rather than open coding them.
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some versions of gcc even warn about it:
mm/shmem.c: In function ‘shmem_file_aio_read’:
mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function
If the loop is aborted during the first iteration by one of the two
first break statements, error will be uninitialized.
Introduced by commit 6e58e79db8a1 ("introduce copy_page_to_iter, kill
loop over iovec in generic_file_aio_read()").
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":
fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
"The biggest change is byte-sized freelist indices which reduces slab
freelist memory usage:
https://lkml.org/lkml/2013/12/2/64"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: slab/slub: use page->list consistently instead of page->lru
mm/slab.c: cleanup outdated comments and unify variables naming
slab: fix wrongly used macro
slub: fix high order page allocation problem with __GFP_NOFAIL
slab: Make allocations with GFP_ZERO slightly more efficient
slab: make more slab management structure off the slab
slab: introduce byte sized index for the freelist of a slab
slab: restrict the number of objects in a slab
slab: introduce helper functions to get/set free object
slab: factor out calculate nr objects in cache_estimate
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
'struct page' has two list_head fields: 'lru' and 'list'. Conveniently,
they are unioned together. This means that code can use them
interchangably, which gets horribly confusing like with this nugget from
slab.c:
> list_del(&page->lru);
> if (page->active == cachep->num)
> list_add(&page->list, &n->slabs_full);
This patch makes the slab and slub code use page->lru universally instead
of mixing ->list and ->lru.
So, the new rule is: page->lru is what the you use if you want to keep
your page on a list. Don't like the fact that it's not called ->list?
Too bad.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As time goes, the code changes a lot, and this leads to that
some old-days comments scatter around , which instead of faciliating
understanding, but make more confusion. So this patch cleans up them.
Also, this patch unifies some variables naming.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit 'slab: restrict the number of objects in a slab' uses
__builtin_constant_p() on #if macro. It is wrong usage of builtin
function, but it is compiled on x86 without any problem, so I can't
find it before 0 day build system find it.
This commit fixes the situation by using KMALLOC_MIN_SIZE, instead of
KMALLOC_SHIFT_LOW. KMALLOC_SHIFT_LOW is parsed to ilog2() on some
architecture and this ilog2() uses __builtin_constant_p() and results in
the problem. This problem would disappear by using KMALLOC_MIN_SIZE,
since it is just constant.
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
SLUB already try to allocate high order page with clearing __GFP_NOFAIL.
But, when allocating shadow page for kmemcheck, it missed clearing
the flag. This trigger WARN_ON_ONCE() reported by Christian Casteyde.
https://bugzilla.kernel.org/show_bug.cgi?id=65991
https://lkml.org/lkml/2013/12/3/764
This patch fix this situation by using same allocation flag as original
allocation.
Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use the likely mechanism already around valid
pointer tests to better choose when to memset
to 0 allocations with __GFP_ZERO
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now, the size of the freelist for the slab management diminish,
so that the on-slab management structure can waste large space
if the object of the slab is large.
Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
in the slab. The size of the freelist for this case would be 31 bytes
so that 97 bytes, that is, more than 75% of object size, are wasted.
In a 64 byte sized slab case, no space is wasted if we use on-slab.
So set off-slab determining constraint to 128 bytes.
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, the freelist of a slab consist of unsigned int sized indexes.
Since most of slabs have less number of objects than 256, large sized
indexes is needless. For example, consider the minimum kmalloc slab. It's
object size is 32 byte and it would consist of one page, so 256 indexes
through byte sized index are enough to contain all possible indexes.
There can be some slabs whose object size is 8 byte. We cannot handle
this case with byte sized index, so we need to restrict minimum
object size. Since these slabs are not major, wasted memory from these
slabs would be negligible.
Some architectures' page size isn't 4096 bytes and rather larger than
4096 bytes (One example is 64KB page size on PPC or IA64) so that
byte sized index doesn't fit to them. In this case, we will use
two bytes sized index.
Below is some number for this patch.
* Before *
kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0
kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0
kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0
kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0
kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0
kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0
kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0
* After *
kmalloc-512 521 648 512 8 1 : tunables 54 27 0 : slabdata 81 81 0
kmalloc-256 208 208 256 16 1 : tunables 120 60 0 : slabdata 13 13 0
kmalloc-192 1029 1029 192 21 1 : tunables 120 60 0 : slabdata 49 49 0
kmalloc-96 529 589 128 31 1 : tunables 120 60 0 : slabdata 19 19 0
kmalloc-64 2142 2142 64 63 1 : tunables 120 60 0 : slabdata 34 34 0
kmalloc-128 660 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
kmalloc-32 11716 11780 32 124 1 : tunables 120 60 0 : slabdata 95 95 0
kmem_cache 197 210 192 21 1 : tunables 120 60 0 : slabdata 10 10 0
kmem_caches consisting of objects less than or equal to 256 byte have
one or more objects than before. In the case of kmalloc-32, we have 11 more
objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of
memory. Of couse, this percentage decreases as the number of objects
in a slab decreases.
Here are the performance results on my 4 cpus machine.
* Before *
Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):
229,945,138 cache-misses ( +- 0.23% )
11.627897174 seconds time elapsed ( +- 0.14% )
* After *
Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):
218,640,472 cache-misses ( +- 0.42% )
11.504999837 seconds time elapsed ( +- 0.21% )
cache-misses are reduced by this patchset, roughly 5%.
And elapsed times are improved by 1%.
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To prepare to implement byte sized index for managing the freelist
of a slab, we should restrict the number of objects in a slab to be less
or equal to 256, since byte only represent 256 different values.
Setting the size of object to value equal or more than newly introduced
SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or
equal to 256 for a slab with 1 page.
If page size is rather larger than 4096, above assumption would be wrong.
In this case, we would fall back on 2 bytes sized index.
If minimum size of kmalloc is less than 16, we use it as minimum object
size and give up this optimization.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the following patches, to get/set free objects from the freelist
is changed so that simple casting doesn't work for it. Therefore,
introduce helper functions.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This logic is not simple to understand so that making separate function
helping readability. Additionally, we can use this change in the
following patch which implement for freelist to have another sized index
in according to nr objects.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild changes from Michal Marek:
"Here is the non-critical part of kbuild:
- One bogus coccinelle check removed, one check fixed not to suggest
the obsolete PTR_RET macro
- scripts/tags.sh does not index the generated *.mod.c files
- new objdiff tool to list differences between two versions of an
object file
- A fix for scripts/bootgraph.pl"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/coccinelle: Use PTR_ERR_OR_ZERO
scripts/bootgraph.pl: Add graphic header
scripts: objdiff: detect object code changes between two commits
Coccicheck: Remove memcpy to struct assignment test
scripts/tags.sh: Ignore *.mod.c
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
PTR_RET is deprecated. Do not recommend its usage anymore.
Use PTR_ERR_OR_ZERO instead.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Adding -header + help function like other .pl in /scripts.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
objdiff is useful when doing large code cleanups. For example, when
removing checkpatch warnings and errors from new drivers in the staging
tree.
objdiff can be used in conjunction with a git rebase to confirm that
each commit made no changes to the resulting object code. It has the
same return values as diff(1).
This was written specifically to support adding the skein and threefish
cryto drivers to the staging tree. I needed a programmatic way to
confirm that commits changing >90% of the lines didn't inadvertently
change the code.
Temporary files (objdump output) are stored in
/path/to/linux/.tmp_objdiff
'make mrproper' will remove this directory.
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Coccinelle script scripts/coccinelle/misc/memcpy-assign.cocci look
for opportunities to replace a call to memcpy by a struct assignment.
This patch removes memcpy-assign.cocci as it is not clear that this
convention has an impact on the generated code.
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
CONFIG_MODVERSIONS=y results in a .mod.c for every compiled file in the
kernel. Issuing a 'make cscope' on a compiled kernel tree results in
the cscope files containing *.mod.c files.
[prarit@prarit linux]# make cscope
[prarit@prarit linux]# cat cscope.files | grep mod.c | wc -l
4807
These files are not useful for cscope and should be ignored. For example,
# line filename / context / line
1 105 arch/x86/kvm/kvm-intel.mod.c <<GLOBAL>>
{ 0x618911fc, __VMLINUX_SYMBOL_STR(numa_node) },
2 508 drivers/block/mtip32xx/mtip32xx.h <<GLOBAL>>
int numa_node;
3 55 drivers/block/mtip32xx/mtip32xx.mod.c <<GLOBAL>>
{ 0x618911fc, __VMLINUX_SYMBOL_STR(numa_node) },
4 37 drivers/cpufreq/acpi-cpufreq.mod.c <<GLOBAL>>
{ 0x618911fc, __VMLINUX_SYMBOL_STR(numa_node) },
<snip>
Add an export to RCS_FIND_IGNORE so it can be used in scripts/tags.sh
and add explicitly ignore *.mod.c files.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.
When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.
This function aborts them with DID_SOFT_ERROR.
If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.
The result is that disk returning QUEUE FULL causes request failures.
The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags. The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags. The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 8f619b5429d9 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|