| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Installing the appropriate non-IOMMU DMA ops in arm_iommu_detch_device()
serves the case where IOMMU-aware drivers choose to control their own
mapping but still make DMA API calls, however it also affects the case
when the arch code itself tears down the mapping upon driver unbinding,
where the ops now get left in place and can inhibit arch_setup_dma_ops()
on subsequent re-probe attempts.
Fix the latter case by making sure that arch_teardown_dma_ops() cleans
up whenever the ops were automatically installed by its counterpart.
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()"
Tested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Tested-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull DMA mapping updates from Christoph Hellwig:
"A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for
architectures that have devices that perform DMA that is not cache
coherent. Based on the existing arm64 implementation and also used
for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel
data leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script"
* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
dma-mapping: fix inverted logic in dma_supported
dma-mapping: deprecate dma_zalloc_coherent
dma-mapping: zero memory returned from dma_alloc_*
sparc/iommu: fix ->map_sg return value
sparc/io-unit: fix ->map_sg return value
arm64: default to the direct mapping in get_arch_dma_ops
PCI: Remove unused attr variable in pci_dma_configure
ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
dma-mapping: bypass indirect calls for dma-direct
vmd: use the proper dma_* APIs instead of direct methods calls
dma-direct: merge swiotlb_dma_ops into the dma_direct code
dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
dma-direct: improve addressability error reporting
swiotlb: remove dma_mark_clean
swiotlb: remove SWIOTLB_MAP_ERROR
ACPI / scan: Refactor _CCA enforcement
dma-mapping: factor out dummy DMA ops
dma-mapping: always build the direct mapping code
dma-mapping: move dma_cache_sync out of line
dma-mapping: move various slow path functions out of line
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Arm already returns (~(dma_addr_t)0x0) on mapping failures, so we can
switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping
code handle the rest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While trying to use the dma_mmap_*() interface, it was noticed that this
interface returns strange values when passed an incorrect length.
If neither of the if() statements fire then the return value is
uninitialized. In the worst case it returns 0 which means the caller
will think the function succeeded.
Fixes: 1655cf8829d8 ("ARM: dma-mapping: Remove traces of NOMMU code")
Signed-off-by: Nathan Jones <nathanj439@gmail.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dma_alloc_from_contiguous()
The CMA memory allocator doesn't support standard gfp flags for memory
allocation, so there is no point having it as a parameter for
dma_alloc_from_contiguous() function. Replace it by a boolean no_warn
argument, which covers all the underlaying cma_alloc() function
supports.
This will help to avoid giving false feeling that this function supports
standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
what has already been an issue: see commit dd65a941f6ba ("arm64:
dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
Link: http://lkml.kernel.org/r/20180709122020eucas1p21a71b092975cb4a3b9954ffc63f699d1~-sqUFoa-h2939329393eucas1p2Y@eucas1p2.samsung.com
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michał Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of setting the DMA ops pointer to NULL, set the correct,
non-IOMMU ops depending on the device's coherency setting.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull ARM updates from Russell King:
- Initial round of Spectre variant 1 and variant 2 fixes for 32-bit ARM
- Clang support improvements
- nommu updates for v8 MPU
- enable ARM_MODULE_PLTS by default to avoid problems loading modules
with larger kernels
- vmlinux.lds and dma-mapping cleanups
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (31 commits)
ARM: spectre-v1: fix syscall entry
ARM: spectre-v1: add array_index_mask_nospec() implementation
ARM: spectre-v1: add speculation barrier (csdb) macros
ARM: KVM: report support for SMCCC_ARCH_WORKAROUND_1
ARM: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
ARM: spectre-v2: KVM: invalidate icache on guest exit for Brahma B15
ARM: KVM: invalidate icache on guest exit for Cortex-A15
ARM: KVM: invalidate BTB on guest exit for Cortex-A12/A17
ARM: spectre-v2: warn about incorrect context switching functions
ARM: spectre-v2: add firmware based hardening
ARM: spectre-v2: harden user aborts in kernel space
ARM: spectre-v2: add Cortex A8 and A15 validation of the IBE bit
ARM: spectre-v2: harden branch predictor on context switches
ARM: spectre: add Kconfig symbol for CPUs vulnerable to Spectre
ARM: bugs: add support for per-processor bug checking
ARM: bugs: hook processor bug checking into SMP and suspend paths
ARM: bugs: prepare processor bug infrastructure
ARM: add more CPU part numbers for Cortex and Brahma B15 CPUs
ARM: 8774/1: remove no-op macro VMLINUX_SYMBOL()
ARM: 8773/1: amba: Export amba_bustype
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use vma_pages() function instead of open coding it.
Generated by scripts/coccinelle/api/vma_pages.cocci.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull dma-mapping updates from Christoph Hellwig:
- replace the force_dma flag with a dma_configure bus method. (Nipun
Gupta, although one patch is іncorrectly attributed to me due to a
git rebase bug)
- use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
- remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
right thing for bounce buffering.
- move dma-debug initialization to common code, and apply a few
cleanups to the dma-debug code.
- cleanup the Kconfig mess around swiotlb selection
- swiotlb comment fixup (Yisheng Xie)
- a trivial swiotlb fix. (Dan Carpenter)
- support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
- add a new generic dma-noncoherent dma_map_ops implementation and use
it for arc, c6x and nds32.
- improve scatterlist validity checking in dma-debug. (Robin Murphy)
- add a struct device quirk to limit the dma-mask to 32-bit due to
bridge/system issues, and switch x86 to use it instead of a local
hack for VIA bridges.
- handle devices without a dma_mask more gracefully in the dma-direct
code.
* tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
dma-direct: don't crash on device without dma_mask
nds32: use generic dma_noncoherent_ops
nds32: implement the unmap_sg DMA operation
nds32: consolidate DMA cache maintainance routines
x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
x86/pci-dma: remove the explicit nodac and allowdac option
x86/pci-dma: remove the experimental forcesac boot option
Documentation/x86: remove a stray reference to pci-nommu.c
core, dma-direct: add a flag 32-bit dma limits
dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
dma-debug: check scatterlist segments
c6x: use generic dma_noncoherent_ops
arc: use generic dma_noncoherent_ops
arc: fix arc_dma_{map,unmap}_page
arc: fix arc_dma_sync_sg_for_{cpu,device}
arc: simplify arc_dma_sync_single_for_{cpu,device}
dma-mapping: provide a generic dma-noncoherent implementation
dma-mapping: simplify Kconfig dependencies
riscv: add swiotlb support
riscv: only enable ZONE_DMA32 for 64-bit
...
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Most mainstream architectures are using 65536 entries, so lets stick to
that. If someone is really desperate to override it that can still be
done through <asm/dma-mapping.h>, but I'd rather see a really good
rationale for that.
dma_debug_init is now called as a core_initcall, which for many
architectures means much earlier, and provides dma-debug functionality
earlier in the boot process. This should be safe as it only relies
on the memory allocator already being available.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts the following commits that change CMA design in MM.
3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")
1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA")
bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
Ville reported a following error on i386.
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
microcode: microcode updated early to revision 0x4, date = 2013-06-28
Initializing CPU#0
Initializing HighMem for node 0 (000377fe:00118000)
Initializing Movable for node 0 (00000001:00118000)
BUG: Bad page state in process swapper pfn:377fe
page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
flags: 0x80000000()
raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
Call Trace:
dump_stack+0x60/0x96
bad_page+0x9a/0x100
free_pages_check_bad+0x3f/0x60
free_pcppages_bulk+0x29d/0x5b0
free_unref_page_commit+0x84/0xb0
free_unref_page+0x3e/0x70
__free_pages+0x1d/0x20
free_highmem_page+0x19/0x40
add_highpages_with_active_regions+0xab/0xeb
set_highmem_pages_init+0x66/0x73
mem_init+0x1b/0x1d7
start_kernel+0x17a/0x363
i386_start_kernel+0x95/0x99
startup_32_smp+0x164/0x168
The reason for this error is that the span of MOVABLE_ZONE is extended
to whole node span for future CMA initialization, and, normal memory is
wrongly freed here. I submitted the fix and it seems to work, but,
another problem happened.
It's so late time to fix the later problem so I decide to reverting the
series.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CMA area is now managed by the separate zone, ZONE_MOVABLE, to fix many
MM related problems. In this implementation, if CONFIG_HIGHMEM = y,
then ZONE_MOVABLE is considered as HIGHMEM and the memory of the CMA
area is also considered as HIGHMEM. That means that they are considered
as the page without direct mapping. However, CMA area could be in a
lowmem and the memory could have direct mapping.
In ARM, when establishing a new mapping for DMA, direct mapping should
be cleared since two mapping with different cache policy could cause
unknown problem. With this patch, PageHighmem() for the CMA memory
located in lowmem returns true so that the function for DMA mapping
cannot notice whether it needs to clear direct mapping or not,
correctly. To handle this situation, this patch always clears direct
mapping for such CMA memory.
Link: http://lkml.kernel.org/r/1512114786-5085-4-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
| |
There are no users of init_dma_coherent_pool_size() left due to
387870f ("mm: dmapool: use provided gfp flags for all
dma_alloc_coherent() calls"), so remove it.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
|
|
|
|
|
|
|
|
| |
atomic_pool is setup once while init stage and never changed after
that, so it is good candidate for __ro_after_init.
Since we are here mark atomic_pool_size with __init_data.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
|
|
|
|
|
|
| |
gen_pool_first_fit_order_align() does not make use of additional data,
so pass plain NULL there.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested [2] to
split the per-device vs. global pool interfaces, so allocation/release
from default DMA pool is driven by dma ops implementation.
This patch implements Robin's idea and provide interface to
allocate/release/mmap the default (aka global) DMA pool.
To make it clear that existing *_from_coherent routines work on
per-device pool rename them to *_from_dev_coherent.
[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Andras Szemzo <sza@esh.hu>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DMA operations for NOMMU case have been just factored out into
separate compilation unit, so don't keep dead code.
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Andras Szemzo <sza@esh.hu>
Tested-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
| |
And instead wire it up as method for all the dma_map_ops instances.
Note that the code seems a little fishy for dmabounce and iommu, but
for now I'd like to preserve the existing behavior 1:1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
| |
DMA_ERROR_CODE is going to go away, so don't rely on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops()
,dma_ops should be cleared in the teardown path. Currently, only the
device's iommu mapping structures are cleared in arch_teardown_dma_ops,
but not the dma_ops. So on the next reprobe, dma_ops left in place is
stale from the first IOMMU setup, but iommu mappings has been disposed
of. This is a problem when the probe of the device is deferred and
recalled with the IOMMU probe deferral.
So for fixing this, slightly refactor by moving the code from
__arm_iommu_detach_device to arm_iommu_detach_device and cleanup
the former. This takes care of resetting the dma_ops in the teardown
path.
Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.
On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.
The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.
We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().
Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.
Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- code optimizations for the Intel VT-d driver
- ability to switch off a previously enabled Intel IOMMU
- support for 'struct iommu_device' for OMAP, Rockchip and Mediatek
IOMMUs
- header optimizations for IOMMU core code headers and a few fixes that
became necessary in other parts of the kernel because of that
- ACPI/IORT updates and fixes
- Exynos IOMMU optimizations
- updates for the IOMMU dma-api code to bring it closer to use per-cpu
iova caches
- new command-line option to set default domain type allocated by the
iommu core code
- another command line option to allow the Intel IOMMU switched off in
a tboot environment
- ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an
IDENTITY domain in conjunction with DMA ops, Support for SMR masking,
Support for 16-bit ASIDs (was previously broken)
- various other small fixes and improvements
* tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits)
soc/qbman: Move dma-mapping.h include to qman_priv.h
soc/qbman: Fix implicit header dependency now causing build fails
iommu: Remove trace-events include from iommu.h
iommu: Remove pci.h include from trace/events/iommu.h
arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops()
ACPI/IORT: Fix CONFIG_IOMMU_API dependency
iommu/vt-d: Don't print the failure message when booting non-kdump kernel
iommu: Move report_iommu_fault() to iommu.c
iommu: Include device.h in iommu.h
x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
iommu/arm-smmu: Correct sid to mask
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
omap3isp: Remove iommu_group related code
iommu/omap: Add iommu-group support
iommu/omap: Make use of 'struct iommu_device'
iommu/omap: Store iommu_dev pointer in arch_data
iommu/omap: Move data structures to omap-iommu.h
iommu/omap: Drop legacy-style device support
...
|
| |\ \
| | | |
| | | |
| | | | |
'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd' and 'core' into next
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The arch_setup_dma_ops() function is in charge of setting dma_ops with a
call to set_dma_ops(). set_dma_ops() is also called from
- highbank and mvebu bus notifiers
- dmabounce (to be replaced with swiotlb)
- arm_iommu_attach_device
(arm_iommu_attach_device is itself called from IOMMU and bus master
device drivers)
To allow the arch_setup_dma_ops() call to be moved from device add time
to device probe time we must ensure that dma_ops already setup by any of
the above callers will not be overriden.
Aftering replacing dmabounce with swiotlb, converting IOMMU drivers to
of_xlate and taking care of highbank and mvebu, the workaround should be
removed.
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The following commit:
commit 815dd18788fe0d41899f51b91d0560279cf16b0d
Author: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Fri Jan 20 13:04:04 2017 -0800
treewide: Consolidate get_dma_ops() implementations
rearranges get_dma_ops in a way that xen_dma_ops are not returned when
running on Xen anymore, dev->dma_ops is returned instead (see
arch/arm/include/asm/dma-mapping.h:get_arch_dma_ops and
include/linux/dma-mapping.h:get_dma_ops).
Fix the problem by storing dev->dma_ops in dev_archdata, and setting
dev->dma_ops to xen_dma_ops. This way, xen_dma_ops is returned naturally
by get_dma_ops. The Xen code can retrieve the original dev->dma_ops from
dev_archdata when needed. It also allows us to remove __generic_dma_ops
from common headers.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> [4.11+]
CC: linux@armlinux.org.uk
CC: catalin.marinas@arm.com
CC: will.deacon@arm.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: Julien Grall <julien.grall@arm.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dma_get_sgtable() tries to create a scatterlist table containing valid
struct page pointers for the coherent memory allocation passed in to it.
However, memory can be declared via dma_declare_coherent_memory(), or
via other reservation schemes which means that coherent memory is not
guaranteed to be backed by struct pages. In such cases, the resulting
scatterlist table contains pointers to invalid pages, which causes
kernel oops later.
This patch adds detection of such memory, and refuses to create a
scatterlist table for such memory.
Reported-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull ARM updates from Russell King:
- nommu updates from Afzal Mohammed cleaning up the vectors support
- allow DMA memory "mapping" for nommu Benjamin Gaignard
- fixing a correctness issue with R_ARM_PREL31 relocations in the
module linker
- add strlen() prototype for the decompressor
- support for DEBUG_VIRTUAL from Florian Fainelli
- adjusting memory bounds after memory reservations have been
registered
- unipher cache handling updates from Masahiro Yamada
- initrd and Thumb Kconfig cleanups
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (23 commits)
ARM: mm: round the initrd reservation to page boundaries
ARM: mm: clean up initrd initialisation
ARM: mm: move initrd init code out of arm_memblock_init()
ARM: 8655/1: improve NOMMU definition of pgprot_*()
ARM: 8654/1: decompressor: add strlen prototype
ARM: 8652/1: cache-uniphier: clean up active way setup code
ARM: 8651/1: cache-uniphier: include <linux/errno.h> instead of <linux/types.h>
ARM: 8650/1: module: handle negative R_ARM_PREL31 addends correctly
ARM: 8649/2: nommu: remove Hivecs configuration is asm
ARM: 8648/2: nommu: display vectors base
ARM: 8647/2: nommu: dynamic exception base address setting
ARM: 8646/1: mmu: decouple VECTORS_BASE from Kconfig
ARM: 8644/1: Reduce "CPU: shutdown" message to debug level
ARM: 8641/1: treewide: Replace uses of virt_to_phys with __pa_symbol
ARM: 8640/1: Add support for CONFIG_DEBUG_VIRTUAL
ARM: 8639/1: Define KERNEL_START and KERNEL_END
ARM: 8638/1: mtd: lart: Rename partition defines to be prefixed with PART_
ARM: 8637/1: Adjust memory boundaries after reservations
ARM: 8636/1: Cleanup sanity_check_meminfo
ARM: add CPU_THUMB_CAPABLE to indicate possible Thumb support
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit ab6494f0c96f ("nommu: Add noMMU support to the DMA API") have
add CONFIG_MMU compilation flag but that prohibit to use dma_mmap_wc()
when the platform doesn't have MMU.
This patch call vm_iomap_memory() in noMMU case to test if addresses
are correct and set vma->vm_flags rather than all return an error.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Most dma_map_ops structures are never modified. Constify these
structures such that these can be write-protected. This patch
has been generated as follows:
git grep -l 'struct dma_map_ops' |
xargs -d\\n sed -i \
-e 's/struct dma_map_ops/const struct dma_map_ops/g' \
-e 's/const struct dma_map_ops {/struct dma_map_ops {/g' \
-e 's/^const struct dma_map_ops;$/struct dma_map_ops;/' \
-e 's/const const struct dma_map_ops /const struct dma_map_ops /g';
sed -i -e 's/const \(struct dma_map_ops intel_dma_ops\)/\1/' \
$(git grep -l 'struct dma_map_ops intel_dma_ops');
sed -i -e 's/const \(struct dma_map_ops dma_iommu_ops\)/\1/' \
$(git grep -l 'struct dma_map_ops' | grep ^arch/powerpc);
sed -i -e '/^struct vmd_dev {$/,/^};$/ s/const \(struct dma_map_ops[[:blank:]]dma_ops;\)/\1/' \
-e '/^static void vmd_setup_dma_ops/,/^}$/ s/const \(struct dma_map_ops \*dest\)/\1/' \
-e 's/const \(struct dma_map_ops \*dest = \&vmd->dma_ops\)/\1/' \
drivers/pci/host/*.c
sed -i -e '/^void __init pci_iommu_alloc(void)$/,/^}$/ s/dma_ops->/intel_dma_ops./' arch/ia64/kernel/pci-dma.c
sed -i -e 's/static const struct dma_map_ops sn_dma_ops/static struct dma_map_ops sn_dma_ops/' arch/ia64/sn/pci/pci_dma.c
sed -i -e 's/(const struct dma_map_ops \*)//' drivers/misc/mic/bus/vop_bus.c
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The callers of the DMA alloc functions already provide the proper
context GFP flags. Make sure to pass them through to the CMA allocator,
to make the CMA compaction context aware.
Link: http://lkml.kernel.org/r/20170127172328.18574-3-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/
|
|
|
|
|
|
|
|
| |
The newly added DMA_ATTR_PRIVILEGED is useful for creating mappings that
are only accessible to privileged DMA engines. Adding it to the
arm dma-mapping.c so that the ARM32 DMA IOMMU mapper can make use of it.
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fs_initcall is definitely too late to initialize DMA-debug hash tables,
because some drivers might get probed and use DMA mapping framework
already in core_initcall. Late initialization of DMA-debug results in
false warning about accessing memory, that was not allocated, like this
one:
------------[ cut here ]------------
WARNING: CPU: 5 PID: 1 at lib/dma-debug.c:1104 check_unmap+0xa1c/0xe50
exynos-sysmmu 10a60000.sysmmu: DMA-API: device driver tries to free DMA memory it has not allocated [device
address=0x000000006ebd0000] [size=16384 bytes]
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc5-00028-g39dde3d-dirty #44
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c0119dd4>] (unwind_backtrace) from [<c01122bc>] (show_stack+0x20/0x24)
[<c01122bc>] (show_stack) from [<c062714c>] (dump_stack+0x84/0xa0)
[<c062714c>] (dump_stack) from [<c0132560>] (__warn+0x14c/0x180)
[<c0132560>] (__warn) from [<c01325dc>] (warn_slowpath_fmt+0x48/0x50)
[<c01325dc>] (warn_slowpath_fmt) from [<c06814f8>] (check_unmap+0xa1c/0xe50)
[<c06814f8>] (check_unmap) from [<c06819c4>] (debug_dma_unmap_page+0x98/0xc8)
[<c06819c4>] (debug_dma_unmap_page) from [<c076c3e8>] (exynos_iommu_domain_free+0x158/0x380)
[<c076c3e8>] (exynos_iommu_domain_free) from [<c0764a30>] (iommu_domain_free+0x34/0x60)
[<c0764a30>] (iommu_domain_free) from [<c011f168>] (release_iommu_mapping+0x30/0xb8)
[<c011f168>] (release_iommu_mapping) from [<c011f23c>] (arm_iommu_release_mapping+0x4c/0x50)
[<c011f23c>] (arm_iommu_release_mapping) from [<c0b061ac>] (s5p_mfc_probe+0x640/0x80c)
[<c0b061ac>] (s5p_mfc_probe) from [<c07e6750>] (platform_drv_probe+0x70/0x148)
[<c07e6750>] (platform_drv_probe) from [<c07e25c0>] (driver_probe_device+0x12c/0x6b0)
[<c07e25c0>] (driver_probe_device) from [<c07e2c6c>] (__driver_attach+0x128/0x17c)
[<c07e2c6c>] (__driver_attach) from [<c07df74c>] (bus_for_each_dev+0x88/0xc8)
[<c07df74c>] (bus_for_each_dev) from [<c07e1b6c>] (driver_attach+0x34/0x58)
[<c07e1b6c>] (driver_attach) from [<c07e1350>] (bus_add_driver+0x18c/0x32c)
[<c07e1350>] (bus_add_driver) from [<c07e4198>] (driver_register+0x98/0x148)
[<c07e4198>] (driver_register) from [<c07e5cb0>] (__platform_driver_register+0x58/0x74)
[<c07e5cb0>] (__platform_driver_register) from [<c174cb30>] (s5p_mfc_driver_init+0x1c/0x20)
[<c174cb30>] (s5p_mfc_driver_init) from [<c0102690>] (do_one_initcall+0x64/0x258)
[<c0102690>] (do_one_initcall) from [<c17014c0>] (kernel_init_freeable+0x3d0/0x4d0)
[<c17014c0>] (kernel_init_freeable) from [<c116eeb4>] (kernel_init+0x18/0x134)
[<c116eeb4>] (kernel_init) from [<c010bbd8>] (ret_from_fork+0x14/0x3c)
---[ end trace dc54c54bd3581296 ]---
This patch moves initialization of DMA-debug to core_initcall. This is
safe from the initialization perspective. dma_debug_do_init() internally calls
debugfs functions and debugfs also gets initialised at core_initcall(), and
that is earlier than arch code in the link order, so it will get initialized
just before the DMA-debug.
Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull dmaengine updates from Vinod Koul:
"This is bit large pile of code which bring in some nice additions:
- Error reporting: we have added a new mechanism for users of
dmaenegine to register a callback_result which tells them the
result of the dma transaction. Right now only one user (ntb) is
using it.
- As we discussed on KS mailing list and pointed out NO_IRQ has no
place in kernel, this also remove NO_IRQ from dmaengine subsystem
(both arm and ppc users)
- Support for IOMMU slave transfers and its implementation for arm.
- To get better build coverage, enable COMPILE_TEST for bunch of
driver, and fix the warning and sparse complaints on these.
- Apart from above, usual updates spread across drivers"
* tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (169 commits)
async_pq_val: fix DMA memory leak
dmaengine: virt-dma: move function declarations
dmaengine: omap-dma: Enable burst and data pack for SG
DT: dmaengine: rcar-dmac: document R8A7743/5 support
dmaengine: fsldma: Unmap region obtained by of_iomap
dmaengine: jz4780: fix resource leaks on error exit return
dma-debug: fix ia64 build, use PHYS_PFN
dmaengine: coh901318: fix integer overflow when shifting more than 32 places
dmaengine: edma: avoid uninitialized variable use
dma-mapping: fix m32r build warning
dma-mapping: fix ia64 build, use PHYS_PFN
dmaengine: ti-dma-crossbar: enable COMPILE_TEST
dmaengine: omap-dma: enable COMPILE_TEST
dmaengine: edma: enable COMPILE_TEST
dmaengine: ti-dma-crossbar: Fix of_device_id data parameter usage
dmaengine: ti-dma-crossbar: Correct type for of_find_property() third parameter
dmaengine/ARM: omap-dma: Fix the DMAengine compile test on non OMAP configs
dmaengine: edma: Rename set_bits and remove unused clear_bits helper
dmaengine: edma: Use correct type for of_find_property() third parameter
dmaengine: edma: Fix of_device_id data parameter usage (legacy vs TPCC)
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add methods to map/unmap device resources addresses for dma_map_ops that
are IOMMU aware. This is needed to map a device MMIO register from a
physical address.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
According to Documentation/printk-formats.txt when printing
a size_t variable we should use %zu or %zx format specifiers.
As we are printing a memory size value, we should better use %zu
in this case.
Reported-by: Frank Mori Hess <fmh6jj@gmail.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When doing dma allocation with IOMMU the __iommu_alloc_atomic() was
used even when the system was coherent. However, this function
allocates from a non-cacheable pool, which is fine when the device is
not cache coherent but won't work as expected in the device is cache
coherent. Indeed, the CPU and device must access the memory using the
same cacheability attributes.
Moreover when the devices are coherent, the mmap call must not change
the pg_prot flags in the vma struct. The arm_coherent_iommu_mmap_attrs
has been updated in the same way that it was done for the arm_dma_mmap
in commit 55af8a91640d ("ARM: 8387/1: arm/mm/dma-mapping.c: Add
arm_coherent_dma_mmap").
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a L2 cache controller is used in a system that provides hardware
coherency, the entire outer cache operations are useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.
In the current kernel implementation, the outer cache flush range
operation is triggered by the dma_alloc function.
This operation can be take place during runtime and in some
circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
SoCs.
This patch extends the __dma_clear_buffer() function to receive a
boolean argument related to the coherency of the system. The same
things is done for the calling functions.
Reported-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull ARM updates from Russell King:
"Changes included in this pull request:
- revert pxa2xx-flash back to using ioremap_cached() and switch
memremap() to use arch_memremap_wb()
- remove pci=firmware command line argument handling
- remove unnecessary arm_dma_set_mask() implementation, the generic
implementation will do for ARM
- removal of the ARM kallsyms "hack" to work around mode switching
veneers and vectors located below PAGE_OFFSET
- tidy up build system output a little
- add L2 cache power management DT bindings
- remove duplicated local_irq_disable() in reboot paths
- handle AMBA primecell devices better at registration time with PM
domains (needed for Samsung SoCs)
- ARM specific preparation to support Keystone II kexec"
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8567/1: cache-uniphier: activate ways for secondary CPUs
ARM: 8570/2: Documentation: devicetree: Add PL310 PM bindings
ARM: 8569/1: pl2x0: Add OF control of cache power management
ARM: 8568/1: reboot: remove duplicated local_irq_disable()
ARM: 8566/1: drivers: amba: properly handle devices with power domains
ARM: provide arm_has_idmap_alias() helper
ARM: kexec: remove 512MB restriction on kexec crashdump
ARM: provide improved virt_to_idmap() functionality
ARM: kexec: fix crashkernel= handling
ARM: 8557/1: specify install, zinstall, and uinstall as PHONY targets
ARM: 8562/1: suppress "include/generated/mach-types.h is up to date."
ARM: 8553/1: kallsyms: remove --page-offset command line option
ARM: 8552/1: kallsyms: remove special lower address limit for CONFIG_ARM
ARM: 8555/1: kallsyms: ignore ARM mode switching veneers
ARM: 8548/1: dma-mapping: remove arm_dma_set_mask()
ARM: 8554/1: kernel: pci: remove pci=firmware command line parameter handling
ARM: memremap: implement arch_memremap_wb()
memremap: add arch specific hook for MEMREMAP_WB mappings
mtd: pxa2xx-flash: switch back from memremap to ioremap_cached
ARM: reintroduce ioremap_cached() for creating cached I/O mappings
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
arm_dma_set_mask() implements exactly the same behavior as the fallback
that dma_set_mask() takes if the set_dma_mask op is not set. Remove it
and use that fallback instead like what is already done for
dma_get_mask().
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As a set of driver-provided callbacks and static data, there is no
compelling reason for struct iommu_ops to be mutable in core code, so
enforce const-ness throughout.
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 19e6e5e5392b ("ARM: 8547/1: dma-mapping: store buffer
information") allocates a structure meant for internal buffer management
with the GFP flags of the buffer itself. This can trigger the following
safeguard in the slab/slub allocator:
if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
pr_emerg("gfp: %un", flags & GFP_SLAB_BUG_MASK);
BUG();
}
Fix this by filtering the flags that make the slab allocator unhappy.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given a device which uses arm_coherent_dma_ops and on which
dev_get_cma_area(dev) returns non-NULL, the following usage of the DMA
API with gfp=0 results in memory corruption and a memory leak.
p = dma_alloc_coherent(dev, sz, &dma, 0);
if (p)
dma_free_coherent(dev, sz, p, dma);
The memory leak is because the alloc allocates using
__alloc_simple_buffer() but the free attempts
dma_release_from_contiguous() which does not do free anything since the
page is not in the CMA area.
The memory corruption is because the free calls __dma_remap() on a page
which is backed by only first level page tables. The
apply_to_page_range() + __dma_update_pte() loop ends up interpreting the
section mapping as an addresses to a second level page table and writing
the new PTE to memory which is not used by page tables.
We don't have access to the GFP flags used for allocation in the free
function. Fix this by adding allocator backends and using this
information in the free function so that we always use the correct
release routine.
Fixes: 21caf3a7 ("ARM: 8398/1: arm DMA: Fix allocation from CMA for coherent DMA")
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
| |
Keep a list of allocated DMA buffers so that we can store metadata in
alloc() which we later need in free().
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory. The whole point of allocating the big chunks was that it would
make TLB usage efficient.
As Marek Szyprowski indicated:
Please note that mapping memory with larger pages significantly
improves performance, especially when IOMMU has a little TLB
cache. This can be easily observed when multimedia devices do
processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.
Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory. Decoding video is also pretty heavy and the TLB misses aren't a
huge deal. Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.
Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full. In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory. This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out. In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another. Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.
Allocating big chunks of memory is also expensive indirectly. Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks. By eating all the big
chunks we're causing extra work for the rest of the system. We also may
start making other memory allocations fail. While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers. Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes. The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.
The current function is very, very slow.
One problem is the way it keeps trying and trying to allocate big
chunks. Imagine a very fragmented memory that has 4M free but no
contiguous pages at all. Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
- Try to allocate order 10 (no retry)
- Try to allocate order 9 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- For page 2:
- Try to allocate order 9 (no retry)
- Try to allocate order 8 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- ...
- ...
Total number of calls to alloc() calls for this case is:
sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
=> 9228
The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases. I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).
A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them. We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.
Let's adjust the allocation like this:
1. If a big chunk fails, stop trying to hard and bump down to lower
order allocations.
2. Don't try useless orders. The whole point of big chunks is to
optimize the TLB and it can really only make use of 2M, 1M, 64K and
4K sizes.
We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users. A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks. Presumably this would
be a sane strategy for some callers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many locations that do
if (memory_was_allocated_by_vmalloc)
vfree(ptr);
else
kfree(ptr);
but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr(). Unless callers have special reasons, we can
replace this branch with kvfree(). Please check and reply if you found
problems.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|