| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull IOMMU updates from Alex Williamson:
"As Joerg mentioned[1], he's out on paternity leave through the end of
the year and I'm filling in for him in the interim:
- Enforce MSI multiple IRQ alignment in AMD IOMMU
- VT-d PASID error handling fixes
- Add r8a7795 IPMMU support
- Manage runtime PM links on exynos at {add,remove}_device callbacks
- Fix Mediatek driver name to avoid conflict
- Add terminate support to qcom fault handler
- 64-bit IOVA optimizations
- Simplfy IOVA domain destruction, better use of rcache, and skip
anchor nodes on copy
- Convert to IOMMU TLB sync API in io-pgtable-arm{-v7s}
- Drop command queue lock when waiting for CMD_SYNC completion on ARM
SMMU implementations supporting MSI to cacheable memory
- iomu-vmsa cleanup inspired by missed IOTLB sync callbacks
- Fix sleeping lock with preemption disabled for RT
- Dual MMU support for TI DRA7xx DSPs
- Optional flush option on IOVA allocation avoiding overhead when
caller can try other options
[1] https://lkml.org/lkml/2017/10/22/72"
* tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio: (54 commits)
iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq
iommu/mediatek: Fix driver name
iommu/ipmmu-vmsa: Hook up r8a7795 DT matching code
iommu/ipmmu-vmsa: Allow two bit SL0
iommu/ipmmu-vmsa: Make IMBUSCTR setup optional
iommu/ipmmu-vmsa: Write IMCTR twice
iommu/ipmmu-vmsa: IPMMU device is 40-bit bus master
iommu/ipmmu-vmsa: Make use of IOMMU_OF_DECLARE()
iommu/ipmmu-vmsa: Enable multi context support
iommu/ipmmu-vmsa: Add optional root device feature
iommu/ipmmu-vmsa: Introduce features, break out alias
iommu/ipmmu-vmsa: Unify ipmmu_ops
iommu/ipmmu-vmsa: Clean up struct ipmmu_vmsa_iommu_priv
iommu/ipmmu-vmsa: Simplify group allocation
iommu/ipmmu-vmsa: Unify domain alloc/free
iommu/ipmmu-vmsa: Fix return value check in ipmmu_find_group_dma()
iommu/vt-d: Clear pasid table entry when memory unbound
iommu/vt-d: Clear Page Request Overflow fault bit
iommu/vt-d: Missing checks for pasid tables if allocation fails
iommu/amd: Limit the IOVA page range to the specified addresses
...
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | | |
'iommu/ipmmu-vmsa' and 'iommu/iova' into iommu-next-20171113.0
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
get_cpu_ptr() disabled preemption and returns the ->fq object of the
current CPU. raw_cpu_ptr() does the same except that it not disable
preemption which means the scheduler can move it to another CPU after it
obtained the per-CPU object.
In this case this is not bad because the data structure itself is
protected with a spin_lock. This change shouldn't matter however on RT
it does because the sleeping lock can't be accessed with disabled
preemption.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Reported-by: vinadhy@gmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Tie in r8a7795 features and update the IOMMU_OF_DECLARE
compat string to include the updated compat string.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Introduce support for two bit SL0 bitfield in IMTTBCR
by using a separate feature flag.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Introduce a feature to allow opt-out of setting up
IMBUSCR. The default case is unchanged.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Write IMCTR both in the root device and the leaf node.
To allow access of IMCTR introduce the following function:
- ipmmu_ctx_write_all()
While at it also rename context functions:
- ipmmu_ctx_read() -> ipmmu_ctx_read_root()
- ipmmu_ctx_write() -> ipmmu_ctx_write_root()
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The r8a7795 IPMMU supports 40-bit bus mastering. Both
the coherent DMA mask and the streaming DMA mask are
set to unlock the 40-bit address space for coherent
allocations and streaming operations.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Hook up IOMMU_OF_DECLARE() support in case CONFIG_IOMMU_DMA
is enabled. The only current supported case for 32-bit ARM
is disabled, however for 64-bit ARM usage of OF is required.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add support for up to 8 contexts. Each context is mapped to one
domain. One domain is assigned one or more slave devices. Contexts
are allocated dynamically and slave devices are grouped together
based on which IPMMU device they are connected to. This makes slave
devices tied to the same IPMMU device share the same IOVA space.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add root device handling to the IPMMU driver by allowing certain
DT compat strings to enable has_cache_leaf_nodes that in turn will
support both root devices with interrupts and leaf devices that
face the actual IPMMU consumer devices.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Introduce struct ipmmu_features to track various hardware
and software implementation changes inside the driver for
different kinds of IPMMU hardware. Add use_ns_alias_offset
as a first example of a feature to control if the secure
register bank offset should be used or not.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The remaining difference between the ARM-specific and iommu-dma ops is
in the {add,remove}_device implementations, but even those have some
overlap and duplication. By stubbing out the few arm_iommu_*() calls,
we can get rid of the rest of the inline #ifdeffery to both simplify the
code and improve build coverage.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Now that the IPMMU instance pointer is the only thing remaining in the
private data structure, we no longer need the extra level of indirection
and can simply stash that directlty in the fwspec.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We go through quite the merry dance in order to find masters behind the
same IPMMU instance, so that we can ensure they are grouped together.
None of which is really necessary, since the master's private data
already points to the particular IPMMU it is associated with, and that
IPMMU instance data is the perfect place to keep track of a per-instance
group directly.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We have two implementations for ipmmu_ops->alloc depending on
CONFIG_IOMMU_DMA, the difference being whether they accept the
IOMMU_DOMAIN_DMA type or not. However, iommu_dma_get_cookie() is
guaranteed to return an error when !CONFIG_IOMMU_DMA, so if
ipmmu_domain_alloc_dma() was actually checking and handling the return
value correctly, it would behave the same as ipmmu_domain_alloc()
anyway.
Similarly for freeing; iommu_put_dma_cookie() is robust by design.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | |/
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In case of error, the function iommu_group_get() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.
Fixes: 3ae47292024f ("iommu/ipmmu-vmsa: Add new IOMMU_DOMAIN_DMA ops")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In intel_svm_unbind_mm(), pasid table entry must be cleared during
svm free. Otherwise, hardware may be set up with a wild pointer.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently Page Request Overflow bit in IOMMU Fault Status register
is not cleared. Not clearing this bit would mean that any future
page-request is going to be automatically dropped by IOMMU.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | |/
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
intel_svm_alloc_pasid_tables() might return an error but never be
checked by the callers. Later when intel_svm_bind_mm() is called,
there are no checks for valid pasid tables before enabling them.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There exist two Mediatek iommu drivers for the two different
generations of the device. But both drivers have the same name
"mtk-iommu". This breaks the registration of the second driver:
Error: Driver 'mtk-iommu' is already registered, aborting...
Fix this by changing the name for first generation to
"mtk-iommu-v1".
Fixes: b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW")
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The extent of pages specified when applying a reserved region should
include up to the last page of the range, but not the page following
the range.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Fixes: 8d54d6c8b8f3 ('iommu/amd: Implement apply_dm_region call-back')
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is quite useful for debugging. Currently, always TERMINATE the
translation when the fault handler returns (since this is all we need
for debugging drivers). But I expect the SVM work should eventually
let us do something more clever.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Variable flush_addr is being assigned but is never read; it
is redundant and can be removed. Cleans up the clang warning:
drivers/iommu/amd_iommu.c:2388:2: warning: Value stored to 'flush_addr'
is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On an is_allocated() interrupt index, we ALIGN() the current index and
then increment it via the for loop, guaranteeing that it is no longer
aligned for alignments >1. We instead need to align the next index,
to guarantee forward progress, moving the increment-only to the case
where the index was found to be unallocated.
Fixes: 37946d95fc1a ('iommu/amd: Add align parameter to alloc_irq_index()')
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | |\ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | | |
'x86/vt-d' and 'core' into next
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.
However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
Thus we may flush our rcaches and still get no new IOVA like in the
commonly used scenario:
if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);
if (!iova)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);
1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
PCI devices a SAC address
2. alloc_iova() fails due to full 32-bit space
3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
throws entries away for nothing and alloc_iova() fails again
4. Next alloc_iova_fast() call cannot take advantage of rcache since we
have just defeated caches. In this case we pick the slowest option
to proceed.
This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.
Signed-off-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Now that the core API issues its own post-unmap TLB sync call, push that
operation out from the io-pgtable-arm-v7s internals into the users. For
now, we leave the invalidation implicit in the unmap operation, since
none of the current users would benefit much from any change to that.
Note that the conversion of msm_iommu is implicit, since that apparently
has no specific TLB sync operation anyway.
CC: Yong Wu <yong.wu@mediatek.com>
CC: Rob Clark <robdclark@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Now that the core API issues its own post-unmap TLB sync call, push that
operation out from the io-pgtable-arm internals into the users. For now,
we leave the invalidation implicit in the unmap operation, since none of
the current users would benefit much from any change to that.
CC: Magnus Damm <damm+renesas@opensource.se>
CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Anchor nodes are not reserved IOVAs in the way that copy_reserved_iova()
cares about - while the failure from reserve_iova() is benign since the
target domain will already have its own anchor, we still don't want to
be triggering spurious warnings.
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Fixes: bb68b2fbfbd6 ('iommu/iova: Add rbtree anchor node')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When devices with different DMA masks are using the same domain, or for
PCI devices where we usually try a speculative 32-bit allocation first,
there is a fair possibility that the top PFN of the rcache stack at any
given time may be unsuitable for the lower limit, prompting a fallback
to allocating anew from the rbtree. Consequently, we may end up
artifically increasing pressure on the 32-bit IOVA space as unused IOVAs
accumulate lower down in the rcache stacks, while callers with 32-bit
masks also impose unnecessary rbtree overhead.
In such cases, let's try a bit harder to satisfy the allocation locally
first - scanning the whole stack should still be relatively inexpensive.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When popping a pfn from an rcache, we are currently checking it directly
against limit_pfn for viability. Since this represents iova->pfn_lo, it
is technically possible for the corresponding iova->pfn_hi to be greater
than limit_pfn. Although we generally get away with it in practice since
limit_pfn is typically a power-of-two boundary and the IOVAs are
size-aligned, it's pretty trivial to make the iova_rcache_get() path
take the allocation size into account for complete safety.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
All put_iova_domain() should have to worry about is freeing memory - by
that point the domain must no longer be live, so the act of cleaning up
doesn't need to be concurrency-safe or maintain the rbtree in a
self-consistent state. There's no need to waste time with locking or
emptying the rcache magazines, and we can just use the postorder
traversal helper to clear out the remaining rbtree entries in-place.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The logic of __get_cached_rbnode() is a little obtuse, but then
__get_prev_node_of_cached_rbnode_or_last_node_and_update_limit_pfn()
wouldn't exactly roll off the tongue...
Now that we have the invariant that there is always a valid node to
start searching downwards from, everything gets a bit easier to follow
if we simplify that function to do what it says on the tin and return
the cached node (or anchor node as appropriate) directly. In turn, we
can then deduplicate the rb_prev() and limit_pfn logic into the main
loop itself, further reduce the amount of code under the lock, and
generally make the inner workings a bit less subtle.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Add a permanent dummy IOVA reservation to the rbtree, such that we can
always access the top of the address space instantly. The immediate
benefit is that we remove the overhead of the rb_last() traversal when
not using the cached node, but it also paves the way for further
simplifications.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.
CC: Thierry Reding <thierry.reding@gmail.com>
CC: Jonathan Hunter <jonathanh@nvidia.com>
CC: David Airlie <airlied@linux.ie>
CC: Sudeep Dutt <sudeep.dutt@intel.com>
CC: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.
Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.
Inspired by work by Zhen Lei <thunder.leizhen@huawei.com>.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The mask for calculating the padding size doesn't change, so there's no
need to recalculate it every loop iteration. Furthermore, Once we've
done that, it becomes clear that we don't actually need to calculate a
padding size at all - by flipping the arithmetic around, we can just
combine the upper limit, size, and mask directly to check against the
lower limit.
For an arm64 build, this alone knocks 20% off the object code size of
the entire alloc_iova() function!
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: simplified more of the arithmetic, rewrote commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Checking the IOVA bounds separately before deciding which direction to
continue the search (if necessary) results in redundantly comparing both
pfns twice each. GCC can already determine that the final comparison op
is redundant and optimise it down to 3 in total, but we can go one
further with a little tweak of the ordering (which makes the intent of
the code that much cleaner as a bonus).
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: rewrote commit message to clarify]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Variable did_old is unsigned so checking whether it is
greater or equal to zero is not necessary.
Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The notifier function will take the dmar_global_lock too, so
lockdep complains about inverse locking order when the
notifier is registered under the dmar_global_lock.
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Fixes: 59ce0515cdaf ('iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Make use of the new alignment capability of
alloc_irq_index() to enforce IRQ index alignment
for MSI.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2b324506341cb ('iommu/amd: Add routines to manage irq remapping tables')
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | |/ /
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
For multi-MSI IRQ ranges the IRQ index needs to be aligned
to the power-of-two of the requested IRQ count. Extend the
alloc_irq_index() function to allow such an allocation.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2b324506341cb ('iommu/amd: Add routines to manage irq remapping tables')
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
add_device is a bit more suitable for establishing runtime PM links than
the xlate callback. This change also makes it possible to implement proper
cleanup - in remove_device callback.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
A client user instantiates and attaches to an iommu_domain to
program the OMAP IOMMU associated with the domain. The iommus
programmed by a client user are bound with the iommu_domain
through the user's device archdata. The OMAP IOMMU driver
currently supports only one IOMMU per IOMMU domain per user.
The OMAP IOMMU driver has been enhanced to support allowing
multiple IOMMUs to be programmed by a single client user. This
support is being added mainly to handle the DSP subsystems on
the DRA7xx SoCs, which have two MMUs within the same subsystem.
These MMUs provide translations for a processor core port and
an internal EDMA port. This support allows both the MMUs to
be programmed together, but with each one retaining it's own
internal state objects. The internal EDMA block is managed by
the software running on the DSPs, and this design provides
on-par functionality with previous generation OMAP DSPs where
the EDMA and the DSP core shared the same MMU.
The multiple iommus are expected to be provided through a
sentinel terminated array of omap_iommu_arch_data objects
through the client user's device archdata. The OMAP driver
core is enhanced to loop through the array of attached
iommus and program them for all common operations. The
sentinel-terminated logic is used so as to not change the
omap_iommu_arch_data structure.
NOTE:
1. The IOMMU group and IOMMU core registration is done only for
the DSP processor core MMU even though both MMUs are represented
by their own platform device and are probed individually. The
IOMMU device linking uses this registered MMU device. The struct
iommu_device for the second MMU is not used even though memory
for it is allocated.
2. The OMAP IOMMU debugfs code still continues to operate on
individual IOMMU objects.
Signed-off-by: Suman Anna <s-anna@ti.com>
[t-kristo@ti.com: ported support to 4.13 based kernel]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | |/ / /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The OMAP IOMMU driver allows only a single device (eg: a rproc
device) to be attached per domain. The current attach detection
logic relies on a check for an attached iommu for the respective
client device. Change this logic to use the client device pointer
instead in preparation for supporting multiple iommu devices to be
bound to a single iommu domain, and thereby to a client device.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
While CMD_SYNC is unlikely to complete immediately such that we never go
round the polling loop, with a lightly-loaded queue it may still do so
long before the delay period is up. If we have no better completion
notifier, use similar logic as we have for SMMUv2 to spin a number of
times before each backoff, so that we have more chance of catching syncs
which complete relatively quickly and avoid delaying unnecessarily.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We have separate (identical) timeout values for polling for a queue to
drain and waiting for an MSI to signal CMD_SYNC completion. In reality,
we only wait for the command queue to drain if we're waiting on a sync,
so just merged these two timeouts into a single constant.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
arm_smmu_cmdq_issue_sync is a little unwieldy now that it supports both
MSI and event-based polling, so split it into two functions to make things
easier to follow.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
As an IRQ, the CMD_SYNC interrupt is not particularly useful, not least
because we often need to wait for sync completion within someone else's
IRQ handler anyway. However, when the SMMU is both coherent and supports
MSIs, we can have a lot more fun by not using it as an interrupt at all.
Following the example suggested in the architecture and using a write
targeting normal memory, we can let callers wait on a status variable
outside the lock instead of having to stall the entire queue or even
touch MMIO registers. Since multiple sync commands are guaranteed to
complete in order, a simple incrementing sequence count is all we need
to unambiguously support any realistic number of overlapping waiters.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|