summaryrefslogtreecommitdiffstats
path: root/drivers/iommu
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'iommu-updates-v5.6' of ↵Linus Torvalds2020-02-0527-800/+1555
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Allow compiling the ARM-SMMU drivers as modules. - Fixes and cleanups for the ARM-SMMU drivers and io-pgtable code collected by Will Deacon. The merge-commit (6855d1ba7537) has all the details. - Cleanup of the iommu_put_resv_regions() call-backs in various drivers. - AMD IOMMU driver cleanups. - Update for the x2APIC support in the AMD IOMMU driver. - Preparation patches for Intel VT-d nested mode support. - RMRR and identity domain handling fixes for the Intel VT-d driver. - More small fixes and cleanups. * tag 'iommu-updates-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits) iommu/amd: Remove the unnecessary assignment iommu/vt-d: Remove unnecessary WARN_ON_ONCE() iommu/vt-d: Unnecessary to handle default identity domain iommu/vt-d: Allow devices with RMRRs to use identity domain iommu/vt-d: Add RMRR base and end addresses sanity check iommu/vt-d: Mark firmware tainted if RMRR fails sanity check iommu/amd: Remove unused struct member iommu/amd: Replace two consecutive readl calls with one readq iommu/vt-d: Don't reject Host Bridge due to scope mismatch PCI/ATS: Add PASID stubs iommu/arm-smmu-v3: Return -EBUSY when trying to re-add a device iommu/arm-smmu-v3: Improve add_device() error handling iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE iommu/arm-smmu-v3: Add second level of context descriptor table iommu/arm-smmu-v3: Prepare for handling arm_smmu_write_ctx_desc() failure iommu/arm-smmu-v3: Propagate ssid_bits iommu/arm-smmu-v3: Add support for Substream IDs iommu/arm-smmu-v3: Add context descriptor tables allocators iommu/arm-smmu-v3: Prepare arm_smmu_s1_cfg for SSID support ACPI/IORT: Parse SSID property of named component node ...
| *-----. Merge branches 'iommu/fixes', 'arm/smmu', 'x86/amd', 'x86/vt-d' and 'core' ↵Joerg Roedel2020-01-2427-800/+1555
| |\ \ \ \ | | | | | | | | | | | | | | | | | | into next
| | | | | * iommu: virtio: Use generic_iommu_put_resv_regions()Thierry Reding2019-12-231-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new standard function instead of open-coding it. Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: virtualization@lists.linux-foundation.org Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | * iommu: intel: Use generic_iommu_put_resv_regions()Thierry Reding2019-12-231-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new standard function instead of open-coding it. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | * iommu: amd: Use generic_iommu_put_resv_regions()Thierry Reding2019-12-231-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new standard function instead of open-coding it. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | * iommu: arm: Use generic_iommu_put_resv_regions()Thierry Reding2019-12-232-20/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new standard function instead of open-coding it. Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | * iommu: Implement generic_iommu_put_resv_regions()Thierry Reding2019-12-231-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement a generic function for removing reserved regions. This can be used by drivers that don't do anything fancy with these regions other than allocating memory for them. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | * iommu/iova: Silence warnings under memory pressureQian Cai2019-12-232-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call dev_err_once() there because even the "ratelimited" is too much, and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G B Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:2285 inactive_file:1838 isolated_file:0 unevictable:0 dirty:1 writeback:5 unstable:0 slab_reclaimable:13972 slab_unreclaimable:453879 mapped:2380 shmem:154 pagetables:6948 bounce:0 free:19133 free_pcp:7363 free_cma:0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | * iommu: Fix Kconfig indentationKrzysztof Kozlowski2019-12-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Remove unnecessary WARN_ON_ONCE()Lu Baolu2020-01-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address field in device TLB invalidation descriptor is qualified by the S field. If S field is zero, a single page at page address specified by address [63:12] is requested to be invalidated. If S field is set, the least significant bit in the address field with value 0b (say bit N) indicates the invalidation address range. The spec doesn't require the address [N - 1, 0] to be cleared, hence remove the unnecessary WARN_ON_ONCE(). Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order to invalidating all the cached mappings on an endpoint, and below overflow error will be triggered. [...] UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3 shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] Reported-and-tested-by: Frank <fgndev@posteo.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Unnecessary to handle default identity domainLu Baolu2020-01-241-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu default domain framework has been designed to take care of setting identity default domain type. It's unnecessary to handle this again in the VT-d driver. Hence, remove it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Allow devices with RMRRs to use identity domainLu Baolu2020-01-241-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ea2447f700cab ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain"), the Intel IOMMU driver doesn't allow any devices with RMRR locked to use the identity domain. This was added to to fix the issue where the RMRR info for devices being placed in and out of the identity domain gets lost. This identity maps all RMRRs when setting up the identity domain, so that devices with RMRRs could also use it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Add RMRR base and end addresses sanity checkBarret Rhoden2020-01-241-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VT-d spec specifies requirements for the RMRR entries base and end (called 'Limit' in the docs) addresses. This commit will cause the DMAR processing to mark the firmware as tainted if any RMRR entries that do not meet these requirements. Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Mark firmware tainted if RMRR fails sanity checkBarret Rhoden2020-01-241-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RMRR entries describe memory regions that are DMA targets for devices outside the kernel's control. RMRR entries that fail the sanity check are pointing to regions of memory that the firmware did not tell the kernel are reserved or otherwise should not be used. Instead of aborting DMAR processing, this commit marks the firmware as tainted. These RMRRs will still be identity mapped, otherwise, some devices, e.x. graphic devices, will not work during boot. Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: f036c7fa0ab60 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Don't reject Host Bridge due to scope mismatchjimyan2020-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a system with two host bridges(0000:00:00.0,0000:80:00.0), iommu initialization fails with DMAR: Device scope type does not match for 0000:80:00.0 This is because the DMAR table reports this device as having scope 2 (ACPI_DMAR_SCOPE_TYPE_BRIDGE): but the device has a type 0 PCI header: 80:00.0 Class 0600: Device 8086:2020 (rev 06) 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 VT-d works perfectly on this system, so there's no reason to bail out on initialization due to this apparent scope mismatch. Add the class 0x06 ("PCI_BASE_CLASS_BRIDGE") as a heuristic for allowing DMAR initialization for non-bridge PCI devices listed with scope bridge. Signed-off-by: jimyan <jimyan@baidu.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: debugfs: Add support to show page table internalsLu Baolu2020-01-072-2/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export page table internals of the domain attached to each device. Example of such dump on a Skylake machine: $ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct [ ... ] Device 0000:00:14.0 with pasid 0 @0x15f3d9000 IOVA_PFN PML5E PML4E 0x000000008ced0 | 0x0000000000000000 0x000000015f3da003 0x000000008ced1 | 0x0000000000000000 0x000000015f3da003 0x000000008ced2 | 0x0000000000000000 0x000000015f3da003 0x000000008ced3 | 0x0000000000000000 0x000000015f3da003 0x000000008ced4 | 0x0000000000000000 0x000000015f3da003 0x000000008ced5 | 0x0000000000000000 0x000000015f3da003 0x000000008ced6 | 0x0000000000000000 0x000000015f3da003 0x000000008ced7 | 0x0000000000000000 0x000000015f3da003 0x000000008ced8 | 0x0000000000000000 0x000000015f3da003 0x000000008ced9 | 0x0000000000000000 0x000000015f3da003 PDPE PDE PTE 0x000000015f3db003 0x000000015f3dc003 0x000000008ced0003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced1003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced2003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced3003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced4003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced5003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced6003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced7003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced8003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced9003 [ ... ] Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Use iova over first levelLu Baolu2020-01-071-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we make all map/unmap paths support first level page table. Let's turn it on if hardware supports scalable mode. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Update first level super page capabilityLu Baolu2020-01-071-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First-level translation may map input addresses to 4-KByte pages, 2-MByte pages, or 1-GByte pages. Support for 4-KByte pages and 2-Mbyte pages are mandatory for first-level translation. Hardware support for 1-GByte page is reported through the FL1GP field in the Capability Register. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Make first level IOVA canonicalLu Baolu2020-01-071-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First-level translation restricts the input-address to a canonical address (i.e., address bits 63:N have the same value as address bit [N-1], where N is 48-bits with 4-level paging and 57-bits with 5-level paging). (section 3.6 in the spec) This makes first level IOVA canonical by using IOVA with bit [N-1] always cleared. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Flush PASID-based iotlb for iova over first levelLu Baolu2020-01-072-15/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When software has changed first-level tables, it should invalidate the affected IOTLB and the paging-structure-caches using the PASID- based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Setup pasid entries for iova over first levelLu Baolu2020-01-071-5/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel VT-d in scalable mode supports two types of page tables for IOVA translation: first level and second level. The IOMMU driver can choose one from both for IOVA translation according to the use case. This sets up the pasid entry if a domain is selected to use the first-level page table for iova translation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setupLu Baolu2020-01-073-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current intel_pasid_setup_first_level() use 5-level paging for first level translation if CPUs use 5-level paging mode too. This makes sense for SVA usages since the page table is shared between CPUs and IOMMUs. But it makes no sense if we only want to use first level for IOVA translation. Add PASID_FLAG_FL5LP bit in the flags which indicates whether the 5-level paging mode should be used. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attrLu Baolu2020-01-071-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the Intel VT-d specific callback of setting DOMAIN_ATTR_NESTING domain attribution. It is necessary to let the VT-d driver know that the domain represents a virtual machine which requires the IOMMU hardware to support nested translation mode. Return success if the IOMMU hardware suports nested mode, otherwise failure. Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Identify domains using first level page tableLu Baolu2020-01-071-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This checks whether a domain should use the first level page table for map/unmap and marks it in the domain structure. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Loose requirement for flush queue initializatonLu Baolu2020-01-071-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if flush queue initialization fails, we return error or enforce the system-wide strict mode. These are unnecessary because we always check the existence of a flush queue before queuing any iova's for lazy flushing. Printing a informational message is enough. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Avoid iova flush queue in strict modeLu Baolu2020-01-071-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If Intel IOMMU strict mode is enabled by users, it's unnecessary to create the iova flush queue. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: trace: Extend map_sg trace eventLu Baolu2020-01-071-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current map_sg stores trace message in a coarse manner. This extends it so that more detailed messages could be traced. The map_sg trace message looks like: map_sg: dev=0000:00:17.0 [1/9] dev_addr=0xf8f90000 phys_addr=0x158051000 size=4096 map_sg: dev=0000:00:17.0 [2/9] dev_addr=0xf8f91000 phys_addr=0x15a858000 size=4096 map_sg: dev=0000:00:17.0 [3/9] dev_addr=0xf8f92000 phys_addr=0x15aa13000 size=4096 map_sg: dev=0000:00:17.0 [4/9] dev_addr=0xf8f93000 phys_addr=0x1570f1000 size=8192 map_sg: dev=0000:00:17.0 [5/9] dev_addr=0xf8f95000 phys_addr=0x15c6d0000 size=4096 map_sg: dev=0000:00:17.0 [6/9] dev_addr=0xf8f96000 phys_addr=0x157194000 size=4096 map_sg: dev=0000:00:17.0 [7/9] dev_addr=0xf8f97000 phys_addr=0x169552000 size=4096 map_sg: dev=0000:00:17.0 [8/9] dev_addr=0xf8f98000 phys_addr=0x169dde000 size=4096 map_sg: dev=0000:00:17.0 [9/9] dev_addr=0xf8f99000 phys_addr=0x148351000 size=4096 Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Misc macro clean up for SVMJacob Pan2020-01-071-40/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use combined macros for_each_svm_dev() to simplify SVM device iteration and error checking. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Avoid sending invalid page responseJacob Pan2020-01-071-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Page responses should only be sent when last page in group (LPIG) or private data is present in the page request. This patch avoids sending invalid descriptors. Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Replace Intel specific PASID allocator with IOASIDJacob Pan2020-01-074-56/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of generic IOASID code to manage PASID allocation, free, and lookup. Replace Intel specific code. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Fix off-by-one in PASID allocationJacob Pan2020-01-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PASID allocator uses IDR which is exclusive for the end of the allocation range. There is no need to decrement pasid_max. Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA") Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Avoid duplicated code for PASID setupJacob Pan2020-01-071-30/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After each setup for PASID entry, related translation caches must be flushed. We can combine duplicated code into one function which is less error prone. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Reject SVM bind for failed capability checkJacob Pan2020-01-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a check during SVM bind to ensure CPU and IOMMU hardware capabilities are met. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Match CPU and IOMMU paging modeJacob Pan2020-01-071-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When setting up first level page tables for sharing with CPU, we need to ensure IOMMU can support no less than the levels supported by the CPU. It is not adequate, as in the current code, to set up 5-level paging in PASID entry First Level Paging Mode(FLPM) solely based on CPU. Currently, intel_pasid_setup_first_level() is only used by native SVM code which already checks paging mode matches. However, future use of this helper function may not be limited to native SVM. https://lkml.org/lkml/2019/11/18/1037 Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Fix CPU and IOMMU SVM feature matching checksJacob Pan2020-01-072-21/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shared Virtual Memory(SVM) is based on a collective set of hardware features detected at runtime. There are requirements for matching CPU and IOMMU capabilities. The current code checks CPU and IOMMU feature set for SVM support but the result is never stored nor used. Therefore, SVM can still be used even when these checks failed. The consequences can be: 1. CPU uses 5-level paging mode for virtual address of 57 bits, but IOMMU can only support 4-level paging mode with 48 bits address for DMA. 2. 1GB page size is used by CPU but IOMMU does not support it. VT-d unrecoverable faults may be generated. The best solution to fix these problems is to prevent them in the first place. This patch consolidates code for checking PASID, CPU vs. IOMMU paging mode compatibility, as well as provides specific error messages for each failed checks. On sane hardware configurations, these error message shall never appear in kernel log. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * | iommu/vt-d: Add Kconfig option to enable/disable scalable modeLu Baolu2020-01-072-1/+18
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON to make it easier for distributions to enable or disable the Intel IOMMU scalable mode by default during kernel build. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Remove the unnecessary assignmentAdrian Huang2020-01-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assignment of the global variable 'iommu_detected' has been moved from amd_iommu_init_dma_ops() to amd_iommu_detect(), so this patch removes the assignment in amd_iommu_init_dma_ops(). Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Remove unused struct memberAdrian Huang2020-01-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c805b428f206 ("iommu/amd: Remove amd_iommu_pd_list") removes the global list for the allocated protection domains. The corresponding member 'list' of the protection_domain struct is not used anymore, so it can be removed. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Replace two consecutive readl calls with one readqAdrian Huang2020-01-171-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize the reigster reading by using readq instead of the two consecutive readl calls. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Fix typos for PPR macrosAdrian Huang2020-01-072-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bit 13 and bit 14 of the IOMMU control register are PPRLogEn and PPRIntEn. They are related to PPR (Peripheral Page Request) instead of 'PPF'. Fix them accrodingly. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Remove local variablesAdrian Huang2020-01-071-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The usage of the local variables 'range' and 'misc' was removed from commit 226e889b20a9 ("iommu/amd: Remove first/last_device handling") and commit 23c742db2171 ("iommu/amd: Split out PCI related parts of IOMMU initialization"). So, remove them accrodingly. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Remove unused variableJoerg Roedel2019-12-231-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu variable in set_device_exclusion_range() us unused now and causes a compiler warning. Remove it. Fixes: 387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions") Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Only support x2APIC with IVHD type 11h/40hSuravee Suthikulpanit2019-12-232-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation for IOMMU x2APIC support makes use of the MMIO access to MSI capability block registers, which requires checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain the information, and not in the IVHD type 10h IOMMU feature reporting field. Since the BIOS in newer systems, which supports x2APIC, would normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Check feature support bit before accessing MSI capability registersSuravee Suthikulpanit2019-12-232-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IOMMU MMIO access to MSI capability registers is available only if the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit is set if the EFR[XtSup] is set, which might not be the case. Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address low/high and MSI data registers via the MMIO. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regionsAdrian Huang2019-12-231-10/+10
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some buggy BIOSes might define multiple exclusion ranges of the IVMD entries which are associated with the same IOMMU hardware. This leads to the overwritten exclusion range (exclusion_start and exclusion_length members) in set_device_exclusion_range(). Here is a real case: When attaching two Broadcom RAID controllers to a server, the first one reports the failure during booting (the disks connecting to the RAID controller cannot be detected). This patch prevents the issue by treating per-device exclusion ranges as r/w unity-mapped regions. Discussion: * https://lists.linuxfoundation.org/pipermail/iommu/2019-November/040140.html Suggested-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | * | iommu/arm-smmu-v3: Return -EBUSY when trying to re-add a deviceWill Deacon2020-01-151-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although we WARN in arm_smmu_add_device() if the device being added has been added already without a subsequent call to arm_smmu_remove_device(), we still continue half-heartedly, initialising the stream-table for any new StreamIDs that may have magically appeared and re-establishing device links that should still be there from last time. Given that calling ->add_device() twice without removing the device in the meantime is indicative of an error in the caller, just return -EBUSY after warning. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jean Philippe-Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
| | * | iommu/arm-smmu-v3: Improve add_device() error handlingJean-Philippe Brucker2020-01-151-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let add_device() clean up after itself. The iommu_bus_init() function does call remove_device() on error, but other sites (e.g. of_iommu) do not. Don't free level-2 stream tables because we'd have to track if we allocated each of them or if they are used by other endpoints. It's not worth the hassle since they are managed resources. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
| | * | iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STEWill Deacon2020-01-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
| | * | iommu/arm-smmu-v3: Add second level of context descriptor tableJean-Philippe Brucker2020-01-151-8/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SMMU can support up to 20 bits of SSID. Add a second level of page tables to accommodate this. Devices that support more than 1024 SSIDs now have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context descriptors (64kB), allocated on demand. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
| | * | iommu/arm-smmu-v3: Prepare for handling arm_smmu_write_ctx_desc() failureJean-Philippe Brucker2020-01-151-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Second-level context descriptor tables will be allocated lazily in arm_smmu_write_ctx_desc(). Help with handling allocation failure by moving the CD write into arm_smmu_domain_finalise_s1(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> [will: Add comment per discussion on list] Signed-off-by: Will Deacon <will@kernel.org>
OpenPOWER on IntegriCloud