summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
*---. Merge branches 'debug-choice', 'devel-stable' and 'misc' into for-linusRussell King2013-09-05103-1343/+1590
|\ \ \
| | | * ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfoWill Deacon2013-09-022-20/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we support a timer-backed delay loop, I'm quickly getting sick and tired of people complaining that their beloved bogomips value has decreased. You know who you are! This patch removes the bogomips line from /proc/cpuinfo, based on the reasoning that any program parsing this is already broken and, as such, won't be further broken if the field is removed. Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tablesDouglas Anderson2013-09-022-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It appears that gcc may put some code in ".text.unlikely" or ".text.hot" sections. Right now those aren't accounted for in unwind tables. Add them. I found some docs about this at: http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc.pdf Without this, if you have slub_debug turned on, you can get messages that look like this: unwind: Index not found 7f008c50 Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7828/1: ARMv7-M: implement restart routine common to all v7-M machinesUwe Kleine-König2013-09-023-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly introduced function is to be used as .restart callback for ARMv7-M machines. The used register is architecturally defined, so it should work for all M-class machines. Acked-by: Jonathan Austin <jonathan.austin@arm.com> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7823/1: errata: workaround Cortex-A15 erratum 773022Will Deacon2013-09-022-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Cortex-A15 CPUs up to and including r0p4, in certain rare sequences of code, the loop buffer may deliver incorrect instructions. This workaround disables the loop buffer to avoid the erratum. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7825/1: document the use of NEON in kernel modeArd Biesheuvel2013-08-251-0/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a file to Documentation/arm explaining how kernel mode NEON is supposed to be used. Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7824/1: update advice on kernel, initramfs and FDT load address.Ian Campbell2013-08-251-10/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Recommend that the kernel be placed under 128MiB, this is required if CONFIG_AUTO_ZRELADDR=y and good (or at least not bad) advice even if CONFIG_AUTO_ZRELADDR=n. - Recommend that a zImage kernel be placed above 32MiB, this avoids the need to relocate prior to decompression, which can speed up boot. - Add basic info on the requirements when loading a raw (non-zImage) kernel which are stricter than the zImage requirements. - Recommend that the DTB be placed after the 128MiB boundary, avoiding any potential conflict with the kernel decompressor, and within the lowmem region. In practice it could follow the kernel loaded after 32MiB, assuming the kernel decompesses to less than 32MiB, but the 128MiB recommendation is simple and unambiguous. - Add similar recommendation regarding initramfs. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Martin <dave.martin@linaro.org> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Roy Franz <roy.franz@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7822/1: add workaround for ambiguous C99 stdint.h typesArd Biesheuvel2013-08-251-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The C99 types uintXX_t that are usually defined in 'stdint.h' are not as unambiguous on ARM as you would expect. For the types below, there is a difference on ARM between GCC built for bare metal ARM, GCC built for glibc and the kernel itself, which results in build errors if you try to build with -ffreestanding and include 'stdint.h' (such as when you include 'arm_neon.h' in order to use NEON intrinsics) As the typedefs for these types in 'stdint.h' are based on builtin defines supplied by GCC, we can tweak these to align with the kernel's idea of those types, so 'linux/types.h' and 'stdint.h' can be safely included from the same source file (provided that -ffreestanding is used). int32_t uint32_t uintptr_t bare metal GCC long unsigned long unsigned int glibc GCC int unsigned int unsigned int kernel int unsigned int unsigned long Acked by: Dave Martin <dave.martin@arm.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7821/1: DT: binding fixup to align with vendor-prefixes.txtChristian Daudt2013-08-202-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ this is a follow-up to this discussion: http://archive.arm.linux.org.uk/lurker/message/20130730.230827.a1ceb12a.en.html ] This patchset renames all uses of "bcm," name bindings to "brcm," as they were done prior to knowing that brcm had already been standardized as Broadcom vendor prefix (in Documentation/devicetree/bindings/vendor-prefixes.txt). This will not cause any churn on devices because none of these bindings have made it into production yet. Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Christian Daudt <csd@broadcom.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7820/1: mm: cache-l2x0: Print the cache size in kBFabio Estevam2013-08-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have the following output from cache-l2x0: l2x0: 16 ways, CACHE_ID 0x410000c7, AUX_CTRL 0x32070000, Cache size: 1048576 B Using kB for the cache size can improve readability a bit: l2x0: 16 ways, CACHE_ID 0x410000c7, AUX_CTRL 0x32070000, Cache size: 1024 kB While at it use pr_info. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7818/1: feroceon: Add suspend/resume operationEzequiel Garcia2013-08-202-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for suspend/resume operations. The implemented procedures are identical to the ones for ARM926. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7814/2: Allow forced irq threadingThomas Gleixner2013-08-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All timer interrupts and the perf interrupt are marked NO_THREAD, so its safe to allow forced interrupt threading. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7813/1: Mark pmu interupt IRQF_NO_THREADThomas Gleixner2013-08-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PMU interrupts must not be threaded. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7808/1: KVM: mm: Get rid of L_PTE_USER ref from PAGE_S2_DEVICEChristoffer Dall2013-08-132-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | THe L_PTE_USER actually has nothing to do with stage 2 mappings and the L_PTE_S2_RDWR value sets the readable bit, which was what L_PTE_USER was used for before proper handling of stage 2 memory defines. Changelog: [v3]: Drop call to kvm_set_s2pte_writable in mmu.c [v2]: Change default mappings to be r/w instead of r/o, as per Marc Zyngier's suggestion. Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: 7792/1: mm: Remove general hugetlb code from ARMSteven Capper2013-07-312-43/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | General forms of huge_pte_alloc, huge_pte_offset and follow_huge_pmd are now available in mm/hugetlb.c. This patch removes the ARM copies of these functions and activates the general ones by enabling: CONFIG_ARCH_WANT_GENERAL_HUGETLB Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: Allow selection HZ valuesRussell King2013-07-261-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow users to configure the Hz rate on implementations which do not have a fixed clock rate. Whenever HZ_FIXED is not set through one of its "default" settings, the user is allowed to select a tick rate from 100, 200, 250, 300, 500Hz and 1kHz. This is slightly more choice than with the generic HZ selection in kernel/Kconfig.hz (which only does 100, 250, 300Hz and 1kHz). The reason for including 200Hz is that a greater number of other platforms want that via the fixed rate, and 500Hz just seemed to be a better middle value than 300Hz (which is of course very close to 250.) Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | | * ARM: constify machine_desc structure usesRussell King2013-07-2610-25/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct machine_desc records are defined everywhere as a 'const' structure, but unfortuantely it loses its const-ness through the use of linker magic - the symbols which surround the section are not declared const so it becomes possible not to use 'const' for pointers to these const structures. Let's fix this oversight - all pointers to these structures should be marked const too. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | * | Merge branch 'for-rmk/cacheflush-v2' of ↵Russell King2013-08-284-19/+65
| | |\ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into devel-stable
| | | * | ARM: cacheflush: don't bother rounding to nearest vmaWill Deacon2013-08-201-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_cache_op finds the lowest VMA contained in the specified address range and rounds the range to cover only the mapped addresses. Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from flush_cache_user_range") the VMA is not used for anything else in this code and seeing as the low-level cache flushing routines return -EFAULT if the address is not valid, there is no need for this range truncation. This patch removes the VMA handling code from the cacheflushing syscall. Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: cacheflush: don't round address range up to nearest pageWill Deacon2013-08-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The flush_cache_user_range macro takes a pair of addresses describing the start and end of the virtual address range to flush. Due to an accidental oversight when flush_cache_range_user was introduced, the address range was rounded up so that the start and end addresses were page-aligned. For historical reference, the interesting commits in history.git are: 10eacf1775e1 ("[ARM] Clean up ARM cache handling interfaces (part 1)") 71432e79b76b ("[ARM] Add flush_cache_user_page() for sys_cacheflush()") This patch removes the alignment code, reducing the amount of flushing required for ranges that are not an exact multiple of PAGE_SIZE. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jonathan Austin <jonathan.austin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: cacheflush: split user cache-flushing into interruptible chunksWill Deacon2013-08-202-8/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flushing a large, non-faulting VMA from userspace can potentially result in a long time spent flushing the cache line-by-line without preemption occurring (in the case of CONFIG_PREEMPT=n). Whilst this doesn't affect the stability of the system, it can certainly affect the responsiveness and CPU availability for other tasks. This patch splits up the user cacheflush code so that it flushes in chunks of a page. After each chunk has been flushed, we may reschedule if appropriate and, before processing the next chunk, we allow any pending signals to be handled before resuming from where we left off. Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: entry: allow ARM-private syscalls to be restartedWill Deacon2013-07-221-2/+2
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | System calls will only be restarted after signal handling if they (a) return an error code indicating that a restart is required and (b) have `why' set to a non-zero value, to indicate that the signal interrupted them. This patch leaves `why' set to a non-zero value for ARM-private syscalls , and only zeroes it for syscalls that are not implemented. Signed-off-by: Will Deacon <will.deacon@arm.com>
| | * | Merge branch 'for-rmk/barriers' of ↵Russell King2013-08-28721-6933/+10833
| | |\ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into devel-stable
| | | * | ARM: cacheflush: use -ishst dsb variant for ensuring flush completionWill Deacon2013-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush_cache_vmap contains a dsb to ensure that any cacheflushing operations to flush out newly written ptes have completed. This patch adds the -ishst option to the dsb, since that is all that is required for completing cacheflushing in the inner-shareable domain. Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: l2x0: use -st dsb option for ordering writel_relaxed with unlockWill Deacon2013-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | writel_relaxed and spin_unlock are both store operations, so we only need to enforce store ordering in the dsb. Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: mcpm: use -st dsb option prior to sev instructionsWill Deacon2013-08-122-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a similar manner to our spinlock implementation, mcpm uses sev to wake up cores waiting on a lock when the lock is unlocked. In order to ensure that the final write unlocking the lock is visible, a dsb instruction is executed immediately prior to the sev. This patch changes these dsbs to use the -st option, since we only require that the store unlocking the lock is made visible. Acked-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Dave Martin <dave.martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: kvm: use inner-shareable barriers after TLB flushingWill Deacon2013-08-122-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When flushing the TLB at PL2 in response to remapping at stage-2 or VMID rollover, we have a dsb instruction to ensure completion of the command before continuing. Since we only care about other processors for TLB invalidation, use the inner-shareable variant of the dsb instruction instead. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: spinlock: use inner-shareable dsb variant prior to sev instructionWill Deacon2013-08-122-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unlocking a spinlock, we use the sev instruction to signal other CPUs waiting on the lock. Since sev is not a memory access instruction, we require a dsb in order to ensure that the sev is not issued ahead of the store placing the lock in an unlocked state. However, as sev is only concerned with other processors in a multiprocessor system, we can restrict the scope of the preceding dsb to the inner-shareable domain. Furthermore, we can restrict the scope to consider only stores, since there are no independent loads on the unlock path. A side-effect of this change is that a spin_unlock operation no longer forces completion of pending TLB invalidation, something which we rely on when unlocking runqueues to ensure that CPU migration during TLB maintenance routines doesn't cause us to continue before the operation has completed. This patch adds the -ishst suffix to the ARMv7 definition of dsb_sev() and adds an inner-shareable dsb to the context-switch path when running a preemptible, SMP, v7 kernel. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: mm: use inner-shareable barriers for TLB and user cache operationsWill Deacon2013-08-123-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | System-wide barriers aren't required for situations where we only need to make visibility and ordering guarantees in the inner-shareable domain (i.e. we are not dealing with devices or potentially incoherent CPUs). This patch changes the v7 TLB operations, coherent_user_range and dcache_clean_area functions to user inner-shareable barriers. For cache maintenance, only the store access type is required to ensure completion. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: tlb: reduce scope of barrier domains for TLB invalidationWill Deacon2013-08-121-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our TLB invalidation routines may require a barrier before the maintenance (in order to ensure pending page table writes are visible to the hardware walker) and barriers afterwards (in order to ensure completion of the maintenance and visibility in the instruction stream). Whilst this is expensive, the cost can be reduced somewhat by reducing the scope of the barrier instructions: - The barrier before only needs to apply to stores (pte writes) - Local ops are required only to affect the non-shareable domain - Global ops are required only to affect the inner-shareable domain This patch makes these changes for the TLB flushing code. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: barrier: allow options to be passed to memory barrier instructionsWill Deacon2013-08-122-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARMv7, the memory barrier instructions take an optional `option' field which can be used to constrain the effects of a memory barrier based on shareability and access type. This patch allows the caller to pass these options if required, and updates the smp_*() barriers to request inner-shareable barriers, affecting only stores for the _wmb variant. wmb() is also changed to use the -st version of dsb. Reported-by: Albin Tonnerre <albin.tonnerre@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: tlb: don't perform inner-shareable invalidation for local BP opsWill Deacon2013-08-122-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the ASID allocator doesn't require inner-shareable maintenance, we can convert the local_bp_flush_all function to perform only non-shareable flushing, in a similar manner to the TLB invalidation routines. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: tlb: don't bother with barriers for branch predictor maintenanceWill Deacon2013-08-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Branch predictor maintenance is only required when we are either changing the kernel's view of memory (switching tables completely) or dealing with ASID rollover. Both of these use-cases require subsequent TLB invalidation, which has the relevant barrier instructions to ensure completion and visibility of the maintenance, so this patch removes the instruction barrier from [local_]flush_bp_all. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: tlb: don't perform inner-shareable invalidation for local TLB opsWill Deacon2013-08-123-30/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inner-shareable TLB invalidation is typically more expensive than local (non-shareable) invalidation, so performing the broadcasting for local_flush_tlb_* operations is a waste of cycles and needlessly clobbers entries in the TLBs of other CPUs. This patch introduces __flush_tlb_* versions for many of the TLB invalidation functions, which only respect inner-shareable variants of the invalidation instructions when presented with the TLB_V7_UIS_FULL flag. The local version is also inlined to prevent SMP_ON_UP kernels from missing flushes, where the __flush variant would be called with the UP flags. This gains us around 0.5% in hackbench scores for a dual-core A15, but I would expect this to improve as more cores (and clusters) are added to the equation. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | | * | ARM: mm: remove redundant dsb() prior to range TLB invalidationWill Deacon2013-08-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel TLB range invalidation functions already contain dsb instructions before and after the maintenance, so there is no need to introduce additional barriers. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | * | | Pull branch 'for-rmk' of git://git.linaro.org/people/ardbiesheuvel/linux-arm ↵Russell King2013-07-2214-2/+452
| | |\ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into devel-stable Comments from Ard Biesheuvel: I have included two use cases that I have been using, XOR and RAID-6 checksumming. The former gets a 60% performance boost on the NEON, the latter over 400%. ARM: add support for kernel mode NEON Adds kernel_neon_begin/end (renamed from kernel_vfp_begin/end in the previous version to de-emphasize the VFP part as VFP code that needs software assistance is not supported currently.) Introduces <asm/neon.h> and the Kconfig symbol KERNEL_MODE_NEON. This has been aligned with Catalin for arm64, so any NEON code that does not use assembly but intrinsics or the GCC vectorizer (such as my examples) can potentially be shared between arm and arm64 archs. ARM: move VFP init to an earlier boot stage This is needed so the NEON is enabled when the XOR and RAID-6 algo boot time benchmarks are run. ARM: be strict about FP exceptions in kernel mode This adds a check to vfp_support_entry() to flag unsupported uses of the NEON/VFP in kernel mode. FP exceptions (bounces) are flagged as a bug, this is because of their potentially intermittent nature. Exceptions caused by the fact that kernel_neon_begin has not been called are just routed through the undef handler. ARM: crypto: add NEON accelerated XOR implementation This is the xor_blocks() implementation built with -ftree-vectorize, 60% faster than optimized ARM code. It calls in_interrupt() to check whether the NEON flavor can be used: this should really not be necessary, but due to xor_blocks'squite generic nature, there is no telling how exactly people may be using it in the real world. lib/raid6: add ARM-NEON accelerated syndrome calculation This is a port of the RAID-6 checksumming code in altivec.uc ported to use NEON intrinsics. It is about 4x faster than the sequential code.
| | | * | lib/raid6: add ARM-NEON accelerated syndrome calculationArd Biesheuvel2013-07-087-1/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rebased/reworked a patch contributed by Rob Herring that uses NEON intrinsics to perform the RAID-6 syndrome calculations. It uses the existing unroll.awk code to generate several unrolled versions of which the best performing one is selected at boot time. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Cc: hpa@linux.intel.com
| | | * | ARM: crypto: add NEON accelerated XOR implementationArd Biesheuvel2013-07-083-0/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a source file xor-neon.c (which is really just the reference C implementation passed through the GCC vectorizer) and hook it up to the XOR framework. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org>
| | | * | ARM: add support for kernel mode NEONArd Biesheuvel2013-07-083-0/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to safely support the use of NEON instructions in kernel mode, some precautions need to be taken: - the userland context that may be present in the registers (even if the NEON/VFP is currently disabled) must be stored under the correct task (which may not be 'current' in the UP case), - to avoid having to keep track of additional vfpstates for the kernel side, disallow the use of NEON in interrupt context and run with preemption disabled, - after use, re-enable preemption and re-enable the lazy restore machinery by disabling the NEON/VFP unit. This patch adds the functions kernel_neon_begin() and kernel_neon_end() which take care of the above. It also adds the Kconfig symbol KERNEL_MODE_NEON to enable it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org>
| | | * | ARM: be strict about FP exceptions in kernel modeArd Biesheuvel2013-07-082-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The support code in vfp_support_entry does not care whether the exception that caused it to be invoked occurred in kernel mode or in user mode. However, neither condition that could trigger this exception (lazy restore and VFP bounce to support code) is currently allowable in kernel mode. In either case, print a message describing the condition before letting the undefined instruction handler run its course and trigger an oops. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org>
| | | * | ARM: move VFP init to an earlier boot stageArd Biesheuvel2013-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to use the NEON unit in the kernel, we should initialize it a bit earlier in the boot process so NEON users that like to do a quick benchmark at load time (like the xor_blocks or RAID-6 code) find the NEON/VFP unit already enabled. Replaced late_initcall() with core_initcall(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org>
| * | | | ARM: 7826/1: debug: support debug ll on hisilicon socHaojian Zhuang2013-09-021-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support UART0 debug ll on Hisilicon Hi3620 SoC & Hi3716 SoC. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: 7827/1: highbank: fix debug uart virtual address for LPAERob Herring2013-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Section entries are 2MB on LPAE, so the DEBUG_LL virtual address must have the same offset in the 2MB section as the physical address. This fixes async external aborts when DEBUG_LL is enabled on Midway. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: 7806/1: allow DEBUG_UNCOMPRESS for TegraStephen Warren2013-08-252-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DEBUG_UNCOMPRESS was previously disallowed for Tegra due to tegra.S's use of global data that was not linked into the decompressor. Solve this by declaring this symbol in tegra.S when it is being built into the decompressor. For the kernel proper, leave the declaration in mach-tegra/common.c as explained in the comment. Signed-off-by: Stephen Warren <swarren@nvidia.com> Tested-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: 7793/1: debug: use generic option for ep93xx PL10x debug portHartley Sweeten2013-08-252-26/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic option DEBUG_LL_UART_PL01X is now used to select the UART type for the kernel low-level debugging on the ep93xx platform. This enables two config options to provide the physical and virtual base address of the debug UART. Use the generic options instead of providing platform specific options to select the debug UART. UART1 is selected with: DEBUG_UART_PHYS = 0x808c0000 DEBUG_UART_VIRT = 0xfedc0000 UART2 is selected with: DEBUG_UART_PHYS = 0x808d0000 DEBUG_UART_VIRT = 0xfedd0000 UART3 is selected with: DEBUG_UART_PHYS = 0x808e0000 DEBUG_UART_VIRT = 0xfede0000 The selected UART must already be initialized by the bootloader. If it isn't setup nothing will appear (which might be desired). Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Ryan Mallon <rmallon@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: debug: move SPEAr debug to generic PL01x codeRussell King2013-08-253-38/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SPEAr debug code is a copy of the PL01x debugging code, so rather than have this pointless code duplication, lets just use the standard implementation instead. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: debug: move davinci debug to generic 8250 codeRussell King2013-08-252-66/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Davinci's debugging is just a copy of the old 8250_32 code with a different base address. Incorporate this into the generic 8250 debug code. Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: debug: move keystone debug to generic 8250 codeRussell King2013-08-252-46/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keystone's debugging is just a copy of the old 8250_32 code with a different base address. Incorporate this into the generic 8250 debug code. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: debug: remove DEBUG_ROCKCHIP_UARTRussell King2013-08-251-11/+0
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | ARM: debug: provide generic option choices for 8250 and PL01x portsRussell King2013-08-251-7/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide generic option choices for 8250 and PL01x UART ports; these can now be selected by UART type rather than asking about the platform. This means that a kernel configuration user can manually choose the various parameters of the debug UART without resorting to the platform having to encode the possible settings. These two generic options are preferred over further debug entries for these ports; the existing options which refer back to the 8250 and PL01x ports are now considered deprecated. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
OpenPOWER on IntegriCloud