summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core
Commit message (Collapse)AuthorAgeFilesLines
...
| * | RDMA/core: Use simpler spin lock irq API from blocking contextParav Pandit2018-09-061-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add_client_context(), ib_unregister_device() and ib_unregister_client() are designed to call from blocking context. There is no need to save and restore last interrupt state when called from such blocking context. Even though this is not a performance path, using the right spin lock API is desired for code clarity. To avoid checkpatch warning while removing flags, sizeof() is used. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Remove context entries from list while unregistering deviceParav Pandit2018-09-061-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While unregistering a device, remove the context elements from the list to not have any stale entries. With that any errors/bugs can be checked when device is freed. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Use simplified list_for_eachParav Pandit2018-09-061-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While traversing client_data_list in following conditions, linked list is only read, no elements of the list are removed. Therefore, use list_for_each_entry(), instead of list_for_each_safe(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: No need to protect kfree with spin lock and semaphoreParav Pandit2018-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While unregistering a client, only context removal should be protected with lock. There is no need to protect a freeing of such context which is already removed from the list. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()Parav Pandit2018-09-062-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently rdma_addr_cancel() is an async operation, which notifies that cancel is done by executing the callback function given during rdma_resolve_ip(). If resolve_ip request is already completed than callback is not executed. Instead, now rdma_resolve_addr() and rdma_addr_cancel() simplified in following ways. 1. rdma_addr_cancel() now a synchronous method. If request was pending, after it is cancelled, no callback is notified. 2. rdma_resolve_addr() and respective addr_handler() callback doesn't need to hold reference to cm_id. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Rate limit MAD error messagesParav Pandit2018-09-061-35/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While registering a mad agent, a user space can trigger various errors and flood the logs. Therefore, decrease verbosity and rate limit such error messages. While we are at it, use __func__ to print function name. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Fail early if unsupported QP is providedParav Pandit2018-09-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When requested QP type is not supported for a {device, port}, return the error right away before validating all parameters during mad agent registration time. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | Merge branch 'uverbs_dev_cleanups' into rdma.git for-nextJason Gunthorpe2018-09-055-84/+81
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For dependencies, branch based on rdma.git 'for-rc' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/ Pull 'uverbs_dev_cleanups' from Leon Romanovsky: ==================== Reuse the char device code interfaces to simplify ib_uverbs_device creation and destruction. As part of this series, we are sending fix to cleanup path, which was discovered during internal review, The fix definitely can go to -rc, but it means that this series will be dependent on rdma-rc. ==================== * branch 'uverbs_dev_cleanups': RDMA/uverbs: Use device.groups to initialize device attributes RDMA/uverbs: Use cdev_device_add() instead of cdev_add() RDMA/core: Depend on device_add() to add device attributes RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one() Resolved conflict in ib_device_unregister_sysfs() Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | RDMA/uverbs: Use device.groups to initialize device attributesParav Pandit2018-09-052-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of explicitly adding device attribute files and handling such error conditions, depend on device core layer to create device attributes files based group pointer NULL terminated array. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | RDMA/uverbs: Use cdev_device_add() instead of cdev_add()Parav Pandit2018-09-052-39/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing two step process to add char device and create underlying device, use cdev_device_add() which does both. Currently a kobject per uverbs_device is created to keep reference to its holding ib_uverbs_device in addition to its underlying device 'dev'. Instead just use uverbs_device->dev to keep a reference to. With this change there is single reference tracker for ib_uverbs_device structure. This allows for subsequent patch to registers group attribute as well using single API cdev_device_add(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | RDMA/core: Depend on device_add() to add device attributesParav Pandit2018-09-051-34/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of adding/removing device attribute files, depend on device_add() which considers adding these device files based on NULL terminated attributes group array. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | RDMA/core: Replace open-coded variant of get_deviceParav Pandit2018-09-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reuse existing get_device() API to do it symmetric to already used put_device() in commit 924b8900a49d ("RDMA/core: Replace open-coded variant of put_device") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | RDMA/uverbs: Declare closing variable as booleanLeon Romanovsky2018-09-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "closing" variable is used as boolean and set to "true" in one place, update the declaration of that variable and their other assignment to proper type. Fixes: e951747a087a ("IB/uverbs: Rework the locking for cleaning up the ucontext") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | IB/core: Add an unbound WQ type to the new CQ APIJack Morgenstein2018-09-053-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The upstream kernel commit cited below modified the workqueue in the new CQ API to be bound to a specific CPU (instead of being unbound). This caused ALL users of the new CQ API to use the same bound WQ. Specifically, MAD handling was severely delayed when the CPU bound to the WQ was busy handling (higher priority) interrupts. This caused a delay in the MAD "heartbeat" response handling, which resulted in ports being incorrectly classified as "down". To fix this, add a new "unbound" WQ type to the new CQ API, so that users have the option to choose either a bound WQ or an unbound WQ. For MADs, choose the new "unbound" WQ. Fixes: b7363e67b23e ("IB/device: Convert ib-comp-wq to be CPU-bound") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.m> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | RDMA/uverbs: Add generic function to fill in flow action objectMark Bloch2018-09-051-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the initialization of a flow action object to a common function. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs languageMark Bloch2018-09-051-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes it clear and safe to access constants passed in from user space. We define a consistent ABI of u64 for all constants, and verify that the data passed in can be represented by the type the user supplies. The expectation is this will always be used with an enum declaring the constant values, and the user will use the enum type as input to the accessor. To retrieve the attribute value we introduce two helper calls - one standard which may fail if attribute is not valid and one where caller can provide a default value which will be used in case the attribute is not valid (useful when attribute is optional). Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
* | | | Merge tag 'pci-v4.20-changes' of ↵Linus Torvalds2018-10-251-2/+9
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - Fix ASPM link_state teardown on removal (Lukas Wunner) - Fix misleading _OSC ASPM message (Sinan Kaya) - Make _OSC optional for PCI (Sinan Kaya) - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick Talbert) - Remove x86 and arm64 node-local allocation for host bridge structures (Punit Agrawal) - Pay attention to device-specific _PXM node values (Jonathan Cameron) - Support new Immediate Readiness bit (Felipe Balbi) - Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang) - Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) - Cache VF config space size to optimize enumeration of many VFs (KarimAllah Ahmed) - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas) - Fix VMD AERSID quirk Device ID matching (Jon Derrick) - Fix Cadence PHY handling during probe (Alan Douglas) - Signal Cadence Endpoint interrupts via AXI region 0 instead of last region (Alan Douglas) - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan Douglas) - Remove redundant controller tests for "device_type == pci" (Rob Herring) - Document R-Car E3 (R8A77990) bindings (Tho Vu) - Add device tree support for R-Car r8a7744 (Biju Das) - Drop unused mvebu PCIe capability code (Thomas Petazzoni) - Add shared PCI bridge emulation code (Thomas Petazzoni) - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni) - Add aardvark Root Port emulation (Thomas Petazzoni) - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach) - Add initial power management for i.MX7 (Leonard Crestez) - Add PME_Turn_Off support for i.MX7 (Leonard Crestez) - Fix qcom runtime power management error handling (Bjorn Andersson) - Update TI dra7xx unaligned access errata workaround for host mode as well as endpoint mode (Vignesh R) - Fix kirin section mismatch warning (Nathan Chancellor) - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare) - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I) - Update Keystone to use MRRS quirk for host bridge instead of open coding (Kishon Vijay Abraham I) - Refactor Keystone link establishment (Kishon Vijay Abraham I) - Simplify and speed up Keystone link training (Kishon Vijay Abraham I) - Remove unused Keystone host_init argument (Kishon Vijay Abraham I) - Merge Keystone driver files into one (Kishon Vijay Abraham I) - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay Abraham I) - Rename Keystone functions for uniformity (Kishon Vijay Abraham I) - Add Keystone device control module DT binding (Kishon Vijay Abraham I) - Use SYSCON API to get Keystone control module device IDs (Kishon Vijay Abraham I) - Clean up Keystone PHY handling (Kishon Vijay Abraham I) - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I) - Clean up Keystone config space access checks (Kishon Vijay Abraham I) - Get Keystone outbound window count from DT (Kishon Vijay Abraham I) - Clean up Keystone outbound window configuration (Kishon Vijay Abraham I) - Clean up Keystone DBI setup (Kishon Vijay Abraham I) - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I) - Fix Keystone IRQ status checking (Kishon Vijay Abraham I) - Add debug messages for all Keystone errors (Kishon Vijay Abraham I) - Clean up Keystone includes and macros (Kishon Vijay Abraham I) - Fix Mediatek unchecked return value from devm_pci_remap_iospace() (Gustavo A. R. Silva) - Fix Mediatek endpoint/port matching logic (Honghui Zhang) - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui Zhang) - Remove redundant Mediatek PM domain check (Honghui Zhang) - Convert Mediatek to pci_host_probe() (Honghui Zhang) - Fix Mediatek MSI enablement (Honghui Zhang) - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang) - Add Mediatek loadable module support (Honghui Zhang) - Detach VMD resources after stopping root bus to prevent orphan resources (Jon Derrick) - Convert pcitest build process to that used by other tools (iio, perf, etc) (Gustavo Pimentel) * tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI: pcie: Remove redundant 'default n' from Kconfig PCI: aardvark: Implement emulated root PCI bridge config space PCI: mvebu: Convert to PCI emulated bridge config space PCI: mvebu: Drop unused PCI express capability code PCI: Introduce PCI bridge emulated config space common logic PCI: vmd: Detach resources after stopping root bus nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset ...
| * | | | IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()Logan Gunthorpe2018-10-171-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be called to map the correct PCI bus address. To do this, check the first page in the scatter list to see if it is P2P memory or not. At the moment, scatter lists that contain P2P memory must be homogeneous so if the first page is P2P the entire SGL should be P2P. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-10-192-0/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | RDMA/ucma: Fix Spectre v1 vulnerabilityGustavo A. R. Silva2018-10-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hdr.cmd can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential spectre issue 'ucma_cmd_table' [r] (local cap) Fix this by sanitizing hdr.cmd before using it to index ucm_cmd_table. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | IB/ucm: Fix Spectre v1 vulnerabilityGustavo A. R. Silva2018-10-161-0/+3
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hdr.cmd can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential spectre issue 'ucm_cmd_table' [r] (local cap) Fix this by sanitizing hdr.cmd before using it to index ucm_cmd_table. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | | | | RDMA/netdev: Fix netlink support in IPoIBDenis Drozdov2018-10-101-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPoIB netlink support was broken by the below commit since integrating the rdma_netdev support relies on an allocation flow for netdevs that was controlled by the ipoib driver while netdev's rtnl_newlink implementation assumes that the netdev will be allocated by netlink. Such situation leads to crash in __ipoib_device_add, once trying to reuse netlink device. This patch fixes the kernel oops for both mlx4 and mlx5 devices triggered by the following command: Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | | | RDMA/netdev: Hoist alloc_netdev_mqs out of the driverDenis Drozdov2018-10-101-0/+32
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | | RDMA/core: Set right entry state before releasing referenceParav Pandit2018-09-251-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently add_modify_gid() for IB link layer has followong issue in cache update path. When GID update event occurs, core releases reference to the GID table without updating its state and/or entry pointer. CPU-0 CPU-1 ------ ----- ib_cache_update() IPoIB ULP add_modify_gid() [..] put_gid_entry() refcnt = 0, but state = valid, entry is valid. (work item is not yet executed). ipoib_create_ah() rdma_create_ah() rdma_get_gid_attr() <-- Tries to acquire gid_attr which has refcnt = 0. This is incorrect. GID entry state and entry pointer is provides the accurate GID enty state. Such fields must be updated with rwlock to protect against readers and, such fields must be in sane state before refcount can drop to zero. Otherwise above race condition can happen leading to use-after-free situation. Following backtrace has been observed when cache update for an IB port is triggered while IPoIB ULP is creating an AH. Therefore, when updating GID entry, first mark a valid entry as invalid through state and set the barrier so that no callers can acquired the GID entry, followed by release reference to it. refcount_t: increment on 0; use-after-free. WARNING: CPU: 4 PID: 29106 at lib/refcount.c:153 refcount_inc_checked+0x30/0x50 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] RIP: 0010:refcount_inc_checked+0x30/0x50 RSP: 0018:ffff8802ad36f600 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffffffff86710100 RBP: ffff8802d6e60a30 R08: ffffed005d67bf8b R09: ffffed005d67bf8b R10: 0000000000000001 R11: ffffed005d67bf8a R12: ffff88027620cee8 R13: ffff8802d6e60988 R14: ffff8802d6e60a78 R15: 0000000000000202 FS: 0000000000000000(0000) GS:ffff8802eb200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3ab35e5c88 CR3: 00000002ce84a000 CR4: 00000000000006e0 IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready Call Trace: rdma_get_gid_attr+0x220/0x310 [ib_core] ? lock_acquire+0x145/0x3a0 rdma_fill_sgid_attr+0x32c/0x470 [ib_core] rdma_create_ah+0x89/0x160 [ib_core] ? rdma_fill_sgid_attr+0x470/0x470 [ib_core] ? ipoib_create_ah+0x52/0x260 [ib_ipoib] ipoib_create_ah+0xf5/0x260 [ib_ipoib] ipoib_mcast_join_complete+0xbbe/0x2540 [ib_ipoib] Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | IB/uverbs: Free uapi on destroyMark Bloch2018-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure we free struct uverbs_api once we clean the radix tree. It was allocated by uverbs_alloc_api(). Fixes: 9ed3e5f44772 ("IB/uverbs: Build the specs into a radix tree at runtime") Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/uverbs: Fix validity check for modify QPMajd Dibbiny2018-09-201-23/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Uverbs shouldn't enforce QP state in the command unless the user set the QP state bit in the attribute mask. In addition, only copy qp attr fields which have the corresponding bit set in the attribute mask over to the internal attr structure. Fixes: 88de869bbe4f ("RDMA/uverbs: Ensure validity of current QP state value") Fixes: bc38a6abdd5a ("[PATCH] IB uverbs: core implementation") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | ucma: fix a use-after-free in ucma_resolve_ip()Cong Wang2018-09-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race condition between ucma_close() and ucma_resolve_ip(): CPU0 CPU1 ucma_resolve_ip(): ucma_close(): ctx = ucma_get_ctx(file, cmd.id); list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { mutex_lock(&mut); idr_remove(&ctx_idr, ctx->id); mutex_unlock(&mut); ... mutex_lock(&mut); if (!ctx->closing) { mutex_unlock(&mut); rdma_destroy_id(ctx->cm_id); ... ucma_free_ctx(ctx); ret = rdma_resolve_addr(); ucma_put_ctx(ctx); Before idr_remove(), ucma_get_ctx() could still find the ctx and after rdma_destroy_id(), rdma_resolve_addr() may still access id_priv pointer. Also, ucma_put_ctx() may use ctx after ucma_free_ctx() too. ucma_close() should call ucma_put_ctx() too which tests the refcnt and waits for the last one releasing it. The similar pattern is already used by ucma_destroy_id(). Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | | | RDMA/uverbs: Atomically flush and mark closed the comp event queueSteve Wise2018-09-121-0/+1
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently a uverbs completion event queue is flushed of events in ib_uverbs_comp_event_close() with the queue spinlock held and then released. Yet setting ev_queue->is_closed is not set until later in uverbs_hot_unplug_completion_event_file(). In between the time ib_uverbs_comp_event_close() releases the lock and uverbs_hot_unplug_completion_event_file() acquires the lock, a completion event can arrive and be inserted into the event queue by ib_uverbs_comp_handler(). This can cause a "double add" list_add warning or crash depending on the kernel configuration, or a memory leak because the event is never dequeued since the queue is already closed down. So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close(). Cc: stable@vger.kernel.org Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | / RDMA/cma: Protect cma dev list with lockParav Pandit2018-09-061-5/+7
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When AF_IB addresses are used during rdma_resolve_addr() a lock is not held. A cma device can get removed while list traversal is in progress which may lead to crash. ie CPU0 CPU1 ==== ==== rdma_resolve_addr() cma_resolve_ib_dev() list_for_each() cma_remove_one() cur_dev->device mutex_lock(&lock) list_del(); mutex_unlock(&lock); cma_process_remove(); Therefore, hold a lock while traversing the list which avoids such situation. Cc: <stable@vger.kernel.org> # 3.10 Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()Parav Pandit2018-09-051-3/+2
| | | | | | | | | | | | | | | | | | If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap. Fixes: 7d96c9b17636 ("IB/uverbs: Have the core code create the uverbs_root_spec") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Release object lock if destroy failedArtemy Kovalyov2018-09-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The object lock was supposed to always be released during destroy, but when the destruction retry series was integrated with the destroy series it created a failure path that missed the unlock. Keep with convention, if destroy fails the caller must undo all locking. Fixes: 87ad80abc70d ("IB/uverbs: Consolidate uobject destruction") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/ucma: check fd type in ucma_migrate_id()Jann Horn2018-09-041-0/+6
|/ | | | | | | | | | | | | | | | | | | | | The current code grabs the private_data of whatever file descriptor userspace has supplied and implicitly casts it to a `struct ucma_file *`, potentially causing a type confusion. This is probably fine in practice because the pointer is only used for comparisons, it is never actually dereferenced; and even in the comparisons, it is unlikely that a file from another filesystem would have a ->private_data pointer that happens to also be valid in this context. But ->private_data is not always guaranteed to be a valid pointer to an object owned by the file's filesystem; for example, some filesystems just cram numbers in there. Check the type of the supplied file descriptor to be safe, analogous to how other places in the kernel do it. Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm, oom: distinguish blockable mode for mmu notifiersMichal Hocko2018-08-221-7/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe2018-08-161-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2018-08-131-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double*() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ...
| | * Merge tag 'v4.18-rc5' into locking/core, to pick up fixesIngo Molnar2018-07-171-11/+17
| | |\ | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless()Mark Rutland2018-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While __atomic_add_unless() was originally intended as a building-block for atomic_add_unless(), it's now used in a number of places around the kernel. It's the only common atomic operation named __atomic*(), rather than atomic_*(), and for consistency it would be better named atomic_fetch_add_unless(). This lack of consistency is slightly confusing, and gets in the way of scripting atomics. Given that, let's clean things up and promote it to an official part of the atomics API, in the form of atomic_fetch_add_unless(). This patch converts definitions and invocations over to the new name, including the instrumented version, using the following script: ---- git grep -w __atomic_add_unless | while read line; do sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}"; done git grep -w __arch_atomic_add_unless | while read line; do sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}"; done ---- Note that we do not have atomic{64,_long}_fetch_add_unless(), which will be introduced by later patches. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe2018-08-162-21/+79
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | RDMA/uverbs: Expand primary and alt AV port checksJack Morgenstein2018-07-241-5/+54
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit cited below checked that the port numbers provided in the primary and alt AVs are legal. That is sufficient to prevent a kernel panic. However, it is not sufficient for correct operation. In Linux, AVs (both primary and alt) must be completely self-described. We do not accept an AV from userspace without an embedded port number. (This has been the case since kernel 3.14 commit dbf727de7440 ("IB/core: Use GID table in AH creation and dmac resolution")). For the primary AV, this embedded port number must match the port number specified with IB_QP_PORT. We also expect the port number embedded in the alt AV to match the alt_port_num value passed by the userspace driver in the modify_qp command base structure. Add these checks to modify_qp. Cc: <stable@vger.kernel.org> # 4.16 Fixes: 5d4c05c3ee36 ("RDMA/uverbs: Sanitize user entered port numbers prior to access it") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Don't fail in creation of multiple flowsLeon Romanovsky2018-07-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion from offsetof() calculations to sizeof() wrongly behaved for missed exact size and in scenario with more than one flow. In such scenario we got "create flow failed, flow 10: 8 bytes left from uverb cmd" error, which is wrong because the size of kern_spec is exactly 8 bytes, and we were not supposed to fail. Cc: <stable@vger.kernel.org> # 3.12 Fixes: 4fae7f170416 ("RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow") Reported-by: Ran Rozenstein <ranro@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flowLeon Romanovsky2018-06-251-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check of cmd.flow_attr.size should check into account the size of the reserved field (2 bytes), otherwise user can provide a size which will cause a slab-out-of-bounds warning below. ================================================================== BUG: KASAN: slab-out-of-bounds in ib_uverbs_ex_create_flow+0x1740/0x1d00 Read of size 2 at addr ffff880068dff1a6 by task syz-executor775/269 CPU: 0 PID: 269 Comm: syz-executor775 Not tainted 4.18.0-rc1+ #245 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xef/0x17e print_address_description+0x83/0x3b0 kasan_report+0x18d/0x4d0 ib_uverbs_ex_create_flow+0x1740/0x1d00 ib_uverbs_write+0x923/0x1010 __vfs_write+0x10d/0x720 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x433899 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc2724db58 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000020006880 RCX: 0000000000433899 RDX: 00000000000000e0 RSI: 0000000020002480 RDI: 0000000000000003 RBP: 00000000006d7018 R08: 00000000004002f8 R09: 00000000004002f8 R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000 R13: 000000000040cd20 R14: 000000000040cdb0 R15: 0000000000000006 Allocated by task 269: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x1a9/0x510 ib_uverbs_ex_create_flow+0x26c/0x1d00 ib_uverbs_write+0x923/0x1010 __vfs_write+0x10d/0x720 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: __kasan_slab_free+0x12e/0x180 kfree+0x159/0x630 detach_buf+0x559/0x7a0 virtqueue_get_buf_ctx+0x3cc/0xab0 virtblk_done+0x1eb/0x3d0 vring_interrupt+0x16d/0x2b0 __handle_irq_event_percpu+0x10a/0x980 handle_irq_event_percpu+0x77/0x190 handle_irq_event+0xc6/0x1a0 handle_edge_irq+0x211/0xd80 handle_irq+0x3d/0x60 do_IRQ+0x9b/0x220 The buggy address belongs to the object at ffff880068dff180 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 38 bytes inside of 64-byte region [ffff880068dff180, ffff880068dff1c0) The buggy address belongs to the page: page:ffffea0001a37fc0 count:1 mapcount:0 mapping:ffff88006c401780 index:0x0 flags: 0x4000000000000100(slab) raw: 4000000000000100 ffffea0001a31100 0000001100000011 ffff88006c401780 raw: 0000000000000000 00000000802a002a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880068dff080: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb ffff880068dff100: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc >ffff880068dff180: 00 00 00 00 07 fc fc fc fc fc fc fc fb fb fb fb ^ ffff880068dff200: fb fb fb fb fc fc fc fc 00 00 00 00 00 00 fc fc ffff880068dff280: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc ================================================================== Cc: <stable@vger.kernel.org> # 3.12 Fixes: f88482743872 ("IB/core: clarify overflow/underflow checks on ib_create/destroy_flow") Cc: syzkaller <syzkaller@googlegroups.com> Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Protect from attempts to create flows on unsupported QPLeon Romanovsky2018-06-251-0/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flows can be created on UD and RAW_PACKET QP types. Attempts to provide other QP types as an input causes to various unpredictable failures. The reason is that in order to support all various types (e.g. XRC), we are supposed to use real_qp handle and not qp handle and expect to driver/FW to fail such (XRC) flows. The simpler and safer variant is to ban all QP types except UD and RAW_PACKET, instead of relying on driver/FW. Cc: <stable@vger.kernel.org> # 3.11 Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs") Cc: syzkaller <syzkaller@googlegroups.com> Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-06-212-10/+18
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Jason Gunthorpe: "Here are eight fairly small fixes collected over the last two weeks. Regression and crashing bug fixes: - mlx4/5: Fixes for issues found from various checkers - A resource tracking and uverbs regression in the core code - qedr: NULL pointer regression found during testing - rxe: Various small bugs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/rxe: Fix missing completion for mem_reg work requests RDMA/core: Save kernel caller name when creating CQ using ib_create_cq() IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()' RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM IB/mlx5: Fix return value check in flow_counters_set_data() IB/mlx5: Fix memory leak in mlx5_ib_create_flow IB/rxe: avoid double kfree skb
| | * RDMA/core: Save kernel caller name when creating CQ using ib_create_cq()Bharat Potnuri2018-06-181-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Few kernel applications like SCST-iSER create CQ using ib_create_cq(), where accessing CQ structures using rdma restrack tool leads to below NULL pointer dereference. This patch saves caller kernel module name similar to ib_alloc_cq(). BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8132ca70>] skip_spaces+0x30/0x30 PGD 738bac067 PUD 8533f0067 PMD 0 Oops: 0000 [#1] SMP R10: ffff88017fc03300 R11: 0000000000000246 R12: 0000000000000000 R13: ffff88082fa5a668 R14: ffff88017475a000 R15: 0000000000000000 FS: 00002b32726582c0(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000008491a1000 CR4: 00000000003607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffc05af69c>] ? fill_res_name_pid+0x7c/0x90 [ib_core] [<ffffffffc05af79f>] fill_res_cq_entry+0xef/0x170 [ib_core] [<ffffffffc05af4c4>] res_get_common_dumpit+0x3c4/0x480 [ib_core] [<ffffffffc05af5d3>] nldev_res_get_cq_dumpit+0x13/0x20 [ib_core] [<ffffffff815bc1e7>] netlink_dump+0x117/0x2e0 [<ffffffff815bcb8b>] __netlink_dump_start+0x1ab/0x230 [<ffffffffc059fead>] ibnl_rcv_msg+0x11d/0x1f0 [ib_core] [<ffffffffc05af5c0>] ? nldev_res_get_mr_dumpit+0x20/0x20 [ib_core] [<ffffffffc059fd90>] ? rdma_nl_multicast+0x30/0x30 [ib_core] [<ffffffff815bea49>] netlink_rcv_skb+0xa9/0xc0 [<ffffffffc05a0018>] ibnl_rcv+0x98/0xb0 [ib_core] [<ffffffff815be132>] netlink_unicast+0xf2/0x1b0 [<ffffffff815be50f>] netlink_sendmsg+0x31f/0x6a0 [<ffffffff8156b580>] sock_sendmsg+0xb0/0xf0 [<ffffffff816ace9e>] ? _raw_spin_unlock_bh+0x1e/0x20 [<ffffffff8156f998>] ? release_sock+0x118/0x170 [<ffffffff8156b731>] SYSC_sendto+0x121/0x1c0 [<ffffffff81568340>] ? sock_alloc_file+0xa0/0x140 [<ffffffff81221265>] ? __fd_install+0x25/0x60 [<ffffffff8156c2ce>] SyS_sendto+0xe/0x10 [<ffffffff816b6c2a>] system_call_fastpath+0x16/0x1b RIP [<ffffffff8132ca70>] skip_spaces+0x30/0x30 RSP <ffff88072be97760> CR2: 0000000000000000 Cc: <stable@vger.kernel.org> Fixes: f66c8ba4c9fa ("RDMA/core: Save kernel caller name when creating PD and CQ objects") Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * IB/uverbs: Fix ordering of ucontext check in ib_uverbs_writeJason Gunthorpe2018-06-121-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During disassociation the ucontext will become NULL, however due to how the SRCU locking works the ucontext must only be examined after looking at the ib_dev, which governs the RCU control flow. With the wrong ordering userspace will see EINVAL instead of EIO for a disassociated uverbs FD, which breaks rdma-core. Cc: stable@vger.kernel.org Fixes: 491d5c6a3023 ("RDMA/uverbs: Move uncontext check before SRCU read lock") Reported-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
* | | IB/core: Change filter function return type from int to boolParav Pandit2018-08-152-31/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Filter functions returns either 0 or 1, therefore better change their return type from int to bool to reflect the same. Additionally some filter functions have suffix of _filter some doesn't. Make all filter function consistent to have __filter suffix to improve code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | IB/core: Update GID entries for netdevice whose mac address changesParav Pandit2018-08-151-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | Update all GID table entries of the netdevice whose MAC address changed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | IB/core: Add default GIDs of the bond master netdevParav Pandit2018-08-151-29/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently following issues exist: 1. Default GIDs of the lower (slave) netdevice if the bond netdevice is added. Rather default GID should be of bond master netdevice. 2. Due to this, when failover event occurs FAILOVER event handler attempts to delete the GID of the upper device and tries to add the default GID of the lower device. This is incorrect behavior. To have simple and correct code: (a) Split default GIDs addition out of add_netdev_ips(). This allows easier removal in future if RoCE default GIDs are removed. (b) Add default GIDs of the bond master device by using right filter and callback function. (c) Remove unused function enum_netdev_default_gids(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | IB/core: Consider adding default GIDs of bond deviceParav Pandit2018-08-151-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we correctly delete the default GIDs of lower devices during CHANGEUPPER event, add default GIDs of the bonding master device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | IB/core: Delete lower netdevice default GID entries in bonding scenarioParav Pandit2018-08-151-9/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When NETDEV_CHANGEUPPER event occurs, lower device is not yet established as slave of the master, and when upper device is bond device, default GID entries not deleted. Due to this, when bond device is fully configured, default GID entries of bond device cannot be added as default GID entries are occupied by the lower netdevice. This is incorrect. Default GID entries should really be of bond netdevice because in all RoCE GIDs (default or IP), MAC address of the bond device will be used. It is confusing to have default GID of netdevice which is not really used for any purpose. Therefore, as first step, implement (a) filter function which filters if a CHANGEUPPER event netdevice and associated upper device is master device or not. (b) callback function which deletes the default GIDs of lower (event netdevice). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
OpenPOWER on IntegriCloud