| Commit message (Collapse) | Author | Age | Files | Lines |
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that we support a timer-backed delay loop, I'm quickly getting sick
and tired of people complaining that their beloved bogomips value has
decreased. You know who you are!
This patch removes the bogomips line from /proc/cpuinfo, based on the
reasoning that any program parsing this is already broken and, as such,
won't be further broken if the field is removed.
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It appears that gcc may put some code in ".text.unlikely" or
".text.hot" sections. Right now those aren't accounted for in unwind
tables. Add them.
I found some docs about this at:
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc.pdf
Without this, if you have slub_debug turned on, you can get messages
that look like this:
unwind: Index not found 7f008c50
Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The newly introduced function is to be used as .restart callback for
ARMv7-M machines. The used register is architecturally defined, so it
should work for all M-class machines.
Acked-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
On Cortex-A15 CPUs up to and including r0p4, in certain rare sequences
of code, the loop buffer may deliver incorrect instructions. This
workaround disables the loop buffer to avoid the erratum.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add a file to Documentation/arm explaining how kernel mode NEON
is supposed to be used.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
- Recommend that the kernel be placed under 128MiB, this is required if
CONFIG_AUTO_ZRELADDR=y and good (or at least not bad) advice even if
CONFIG_AUTO_ZRELADDR=n.
- Recommend that a zImage kernel be placed above 32MiB, this avoids the need
to relocate prior to decompression, which can speed up boot.
- Add basic info on the requirements when loading a raw (non-zImage) kernel
which are stricter than the zImage requirements.
- Recommend that the DTB be placed after the 128MiB boundary, avoiding any
potential conflict with the kernel decompressor, and within the lowmem
region. In practice it could follow the kernel loaded after 32MiB,
assuming the kernel decompesses to less than 32MiB, but the 128MiB
recommendation is simple and unambiguous.
- Add similar recommendation regarding initramfs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Martin <dave.martin@linaro.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The C99 types uintXX_t that are usually defined in 'stdint.h' are not as
unambiguous on ARM as you would expect. For the types below, there is a
difference on ARM between GCC built for bare metal ARM, GCC built for glibc
and the kernel itself, which results in build errors if you try to build with
-ffreestanding and include 'stdint.h' (such as when you include 'arm_neon.h'
in order to use NEON intrinsics)
As the typedefs for these types in 'stdint.h' are based on builtin defines
supplied by GCC, we can tweak these to align with the kernel's idea of those
types, so 'linux/types.h' and 'stdint.h' can be safely included from the same
source file (provided that -ffreestanding is used).
int32_t uint32_t uintptr_t
bare metal GCC long unsigned long unsigned int
glibc GCC int unsigned int unsigned int
kernel int unsigned int unsigned long
Acked by: Dave Martin <dave.martin@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
[ this is a follow-up to this discussion:
http://archive.arm.linux.org.uk/lurker/message/20130730.230827.a1ceb12a.en.html ]
This patchset renames all uses of "bcm," name bindings to
"brcm," as they were done prior to knowing that brcm had
already been standardized as Broadcom vendor prefix
(in Documentation/devicetree/bindings/vendor-prefixes.txt).
This will not cause any churn on devices because none of
these bindings have made it into production yet.
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Christian Daudt <csd@broadcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently we have the following output from cache-l2x0:
l2x0: 16 ways, CACHE_ID 0x410000c7, AUX_CTRL 0x32070000, Cache size: 1048576 B
Using kB for the cache size can improve readability a bit:
l2x0: 16 ways, CACHE_ID 0x410000c7, AUX_CTRL 0x32070000, Cache size: 1024 kB
While at it use pr_info.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add support for suspend/resume operations. The implemented procedures
are identical to the ones for ARM926.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
All timer interrupts and the perf interrupt are marked NO_THREAD, so
its safe to allow forced interrupt threading.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
PMU interrupts must not be threaded.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
THe L_PTE_USER actually has nothing to do with stage 2 mappings and the
L_PTE_S2_RDWR value sets the readable bit, which was what L_PTE_USER
was used for before proper handling of stage 2 memory defines.
Changelog:
[v3]: Drop call to kvm_set_s2pte_writable in mmu.c
[v2]: Change default mappings to be r/w instead of r/o, as per Marc
Zyngier's suggestion.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
General forms of huge_pte_alloc, huge_pte_offset and follow_huge_pmd
are now available in mm/hugetlb.c.
This patch removes the ARM copies of these functions and activates
the general ones by enabling:
CONFIG_ARCH_WANT_GENERAL_HUGETLB
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Allow users to configure the Hz rate on implementations which do not have
a fixed clock rate.
Whenever HZ_FIXED is not set through one of its "default" settings, the
user is allowed to select a tick rate from 100, 200, 250, 300, 500Hz
and 1kHz. This is slightly more choice than with the generic HZ
selection in kernel/Kconfig.hz (which only does 100, 250, 300Hz and
1kHz). The reason for including 200Hz is that a greater number of other
platforms want that via the fixed rate, and 500Hz just seemed to be a
better middle value than 300Hz (which is of course very close to 250.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
struct machine_desc records are defined everywhere as a 'const'
structure, but unfortuantely it loses its const-ness through the use of
linker magic - the symbols which surround the section are not declared
const so it becomes possible not to use 'const' for pointers to these
const structures.
Let's fix this oversight - all pointers to these structures should be
marked const too.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | |\ \
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into devel-stable
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
do_cache_op finds the lowest VMA contained in the specified address
range and rounds the range to cover only the mapped addresses.
Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from
flush_cache_user_range") the VMA is not used for anything else in this
code and seeing as the low-level cache flushing routines return -EFAULT
if the address is not valid, there is no need for this range truncation.
This patch removes the VMA handling code from the cacheflushing syscall.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The flush_cache_user_range macro takes a pair of addresses describing
the start and end of the virtual address range to flush. Due to an
accidental oversight when flush_cache_range_user was introduced, the
address range was rounded up so that the start and end addresses were
page-aligned.
For historical reference, the interesting commits in history.git are:
10eacf1775e1 ("[ARM] Clean up ARM cache handling interfaces (part 1)")
71432e79b76b ("[ARM] Add flush_cache_user_page() for sys_cacheflush()")
This patch removes the alignment code, reducing the amount of flushing
required for ranges that are not an exact multiple of PAGE_SIZE.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Flushing a large, non-faulting VMA from userspace can potentially result
in a long time spent flushing the cache line-by-line without preemption
occurring (in the case of CONFIG_PREEMPT=n).
Whilst this doesn't affect the stability of the system, it can certainly
affect the responsiveness and CPU availability for other tasks.
This patch splits up the user cacheflush code so that it flushes in
chunks of a page. After each chunk has been flushed, we may reschedule
if appropriate and, before processing the next chunk, we allow any
pending signals to be handled before resuming from where we left off.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |/
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
System calls will only be restarted after signal handling if they (a)
return an error code indicating that a restart is required and (b) have
`why' set to a non-zero value, to indicate that the signal interrupted
them.
This patch leaves `why' set to a non-zero value for ARM-private syscalls
, and only zeroes it for syscalls that are not implemented.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | |\ \
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into devel-stable
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
flush_cache_vmap contains a dsb to ensure that any cacheflushing
operations to flush out newly written ptes have completed.
This patch adds the -ishst option to the dsb, since that is all that is
required for completing cacheflushing in the inner-shareable domain.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
writel_relaxed and spin_unlock are both store operations, so we only
need to enforce store ordering in the dsb.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In a similar manner to our spinlock implementation, mcpm uses sev to
wake up cores waiting on a lock when the lock is unlocked. In order to
ensure that the final write unlocking the lock is visible, a dsb
instruction is executed immediately prior to the sev.
This patch changes these dsbs to use the -st option, since we only
require that the store unlocking the lock is made visible.
Acked-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When flushing the TLB at PL2 in response to remapping at stage-2 or VMID
rollover, we have a dsb instruction to ensure completion of the command
before continuing.
Since we only care about other processors for TLB invalidation, use the
inner-shareable variant of the dsb instruction instead.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When unlocking a spinlock, we use the sev instruction to signal other
CPUs waiting on the lock. Since sev is not a memory access instruction,
we require a dsb in order to ensure that the sev is not issued ahead
of the store placing the lock in an unlocked state.
However, as sev is only concerned with other processors in a
multiprocessor system, we can restrict the scope of the preceding dsb
to the inner-shareable domain. Furthermore, we can restrict the scope to
consider only stores, since there are no independent loads on the unlock
path.
A side-effect of this change is that a spin_unlock operation no longer
forces completion of pending TLB invalidation, something which we rely
on when unlocking runqueues to ensure that CPU migration during TLB
maintenance routines doesn't cause us to continue before the operation
has completed.
This patch adds the -ishst suffix to the ARMv7 definition of dsb_sev()
and adds an inner-shareable dsb to the context-switch path when running
a preemptible, SMP, v7 kernel.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
System-wide barriers aren't required for situations where we only need
to make visibility and ordering guarantees in the inner-shareable domain
(i.e. we are not dealing with devices or potentially incoherent CPUs).
This patch changes the v7 TLB operations, coherent_user_range and
dcache_clean_area functions to user inner-shareable barriers. For cache
maintenance, only the store access type is required to ensure completion.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Our TLB invalidation routines may require a barrier before the
maintenance (in order to ensure pending page table writes are visible to
the hardware walker) and barriers afterwards (in order to ensure
completion of the maintenance and visibility in the instruction stream).
Whilst this is expensive, the cost can be reduced somewhat by reducing
the scope of the barrier instructions:
- The barrier before only needs to apply to stores (pte writes)
- Local ops are required only to affect the non-shareable domain
- Global ops are required only to affect the inner-shareable domain
This patch makes these changes for the TLB flushing code.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
On ARMv7, the memory barrier instructions take an optional `option'
field which can be used to constrain the effects of a memory barrier
based on shareability and access type.
This patch allows the caller to pass these options if required, and
updates the smp_*() barriers to request inner-shareable barriers,
affecting only stores for the _wmb variant. wmb() is also changed to
use the -st version of dsb.
Reported-by: Albin Tonnerre <albin.tonnerre@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that the ASID allocator doesn't require inner-shareable maintenance,
we can convert the local_bp_flush_all function to perform only
non-shareable flushing, in a similar manner to the TLB invalidation
routines.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Branch predictor maintenance is only required when we are either
changing the kernel's view of memory (switching tables completely) or
dealing with ASID rollover.
Both of these use-cases require subsequent TLB invalidation, which has
the relevant barrier instructions to ensure completion and visibility
of the maintenance, so this patch removes the instruction barrier from
[local_]flush_bp_all.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Inner-shareable TLB invalidation is typically more expensive than local
(non-shareable) invalidation, so performing the broadcasting for
local_flush_tlb_* operations is a waste of cycles and needlessly
clobbers entries in the TLBs of other CPUs.
This patch introduces __flush_tlb_* versions for many of the TLB
invalidation functions, which only respect inner-shareable variants of
the invalidation instructions when presented with the TLB_V7_UIS_FULL
flag. The local version is also inlined to prevent SMP_ON_UP kernels
from missing flushes, where the __flush variant would be called with
the UP flags.
This gains us around 0.5% in hackbench scores for a dual-core A15, but I
would expect this to improve as more cores (and clusters) are added to
the equation.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The kernel TLB range invalidation functions already contain dsb
instructions before and after the maintenance, so there is no need to
introduce additional barriers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | |\ \ \
| | | |_|/
| | |/| |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
into devel-stable
Comments from Ard Biesheuvel:
I have included two use cases that I have been using, XOR and RAID-6
checksumming. The former gets a 60% performance boost on the NEON, the
latter over 400%.
ARM: add support for kernel mode NEON
Adds kernel_neon_begin/end (renamed from kernel_vfp_begin/end in the
previous version to de-emphasize the VFP part as VFP code that needs
software assistance is not supported currently.)
Introduces <asm/neon.h> and the Kconfig symbol KERNEL_MODE_NEON. This
has been aligned with Catalin for arm64, so any NEON code that does
not use assembly but intrinsics or the GCC vectorizer (such as my
examples) can potentially be shared between arm and arm64 archs.
ARM: move VFP init to an earlier boot stage
This is needed so the NEON is enabled when the XOR and RAID-6 algo
boot time benchmarks are run.
ARM: be strict about FP exceptions in kernel mode
This adds a check to vfp_support_entry() to flag unsupported uses of
the NEON/VFP in kernel mode. FP exceptions (bounces) are flagged as
a bug, this is because of their potentially intermittent nature.
Exceptions caused by the fact that kernel_neon_begin has not been
called are just routed through the undef handler.
ARM: crypto: add NEON accelerated XOR implementation
This is the xor_blocks() implementation built with -ftree-vectorize,
60% faster than optimized ARM code. It calls in_interrupt() to check
whether the NEON flavor can be used: this should really not be
necessary, but due to xor_blocks'squite generic nature, there is no
telling how exactly people may be using it in the real world.
lib/raid6: add ARM-NEON accelerated syndrome calculation
This is a port of the RAID-6 checksumming code in altivec.uc ported
to use NEON intrinsics. It is about 4x faster than the sequential
code.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: hpa@linux.intel.com
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a source file xor-neon.c (which is really just the reference
C implementation passed through the GCC vectorizer) and hook it
up to the XOR framework.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In order to safely support the use of NEON instructions in
kernel mode, some precautions need to be taken:
- the userland context that may be present in the registers (even
if the NEON/VFP is currently disabled) must be stored under the
correct task (which may not be 'current' in the UP case),
- to avoid having to keep track of additional vfpstates for the
kernel side, disallow the use of NEON in interrupt context
and run with preemption disabled,
- after use, re-enable preemption and re-enable the lazy restore
machinery by disabling the NEON/VFP unit.
This patch adds the functions kernel_neon_begin() and
kernel_neon_end() which take care of the above. It also adds
the Kconfig symbol KERNEL_MODE_NEON to enable it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The support code in vfp_support_entry does not care whether the
exception that caused it to be invoked occurred in kernel mode or
in user mode. However, neither condition that could trigger this
exception (lazy restore and VFP bounce to support code) is
currently allowable in kernel mode.
In either case, print a message describing the condition before
letting the undefined instruction handler run its course and trigger
an oops.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In order to use the NEON unit in the kernel, we should
initialize it a bit earlier in the boot process so NEON users
that like to do a quick benchmark at load time (like the
xor_blocks or RAID-6 code) find the NEON/VFP unit already
enabled.
Replaced late_initcall() with core_initcall().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Support UART0 debug ll on Hisilicon Hi3620 SoC & Hi3716 SoC.
Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Section entries are 2MB on LPAE, so the DEBUG_LL virtual address must
have the same offset in the 2MB section as the physical address. This
fixes async external aborts when DEBUG_LL is enabled on Midway.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
DEBUG_UNCOMPRESS was previously disallowed for Tegra due to tegra.S's
use of global data that was not linked into the decompressor. Solve this
by declaring this symbol in tegra.S when it is being built into the
decompressor. For the kernel proper, leave the declaration in
mach-tegra/common.c as explained in the comment.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The generic option DEBUG_LL_UART_PL01X is now used to select the UART
type for the kernel low-level debugging on the ep93xx platform. This
enables two config options to provide the physical and virtual base
address of the debug UART.
Use the generic options instead of providing platform specific options
to select the debug UART.
UART1 is selected with: DEBUG_UART_PHYS = 0x808c0000
DEBUG_UART_VIRT = 0xfedc0000
UART2 is selected with: DEBUG_UART_PHYS = 0x808d0000
DEBUG_UART_VIRT = 0xfedd0000
UART3 is selected with: DEBUG_UART_PHYS = 0x808e0000
DEBUG_UART_VIRT = 0xfede0000
The selected UART must already be initialized by the bootloader. If it
isn't setup nothing will appear (which might be desired).
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The SPEAr debug code is a copy of the PL01x debugging code, so rather
than have this pointless code duplication, lets just use the standard
implementation instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Davinci's debugging is just a copy of the old 8250_32 code with a
different base address. Incorporate this into the generic 8250
debug code.
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Keystone's debugging is just a copy of the old 8250_32 code with a
different base address. Incorporate this into the generic 8250
debug code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Provide generic option choices for 8250 and PL01x UART ports; these can
now be selected by UART type rather than asking about the platform.
This means that a kernel configuration user can manually choose the
various parameters of the debug UART without resorting to the platform
having to encode the possible settings.
These two generic options are preferred over further debug entries for
these ports; the existing options which refer back to the 8250 and PL01x
ports are now considered deprecated.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|