summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/qib
Commit message (Collapse)AuthorAgeFilesLines
* fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman2013-03-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* IB/qib: convert to idr_alloc()Tejun Heo2013-02-271-13/+8
| | | | | | | | | Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Marciniszyn <infinipath@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-02-262-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
| * new helper: file_inode(file)Al Viro2013-02-222-3/+3
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | IB/qib: Fix QP locate/remove raceMike Marciniszyn2013-02-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | remove_qp() can execute concurrently with a qib_lookup_qpn() on another CPU, which in of itself, is ok, given the RCU locking. The issue is that remove_qp() NULLs out the qp->next field so that a qib_lookup_qpn() might fail to find a qp if it occurs after the one that is being deleted. This is a momentary issue and subsequent qib_lookup_qpn() calls would find the qp's since the search restarts from the bucket head. At scale, the issue might causes dropped packets and unnecessary retransmissions. The fix just deletes the qp->next NULL assignment to prevent the remove_qp() from hiding qp's from qib_lookup_qpn(). Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | IB/qib: Fix for broken sparse warning fixMike Marciniszyn2013-02-051-8/+3
|/ | | | | | | | | | | | | Commit 1fb9fed6d489 ("IB/qib: Fix QP RCU sparse warning") broke QP hash list deletion in qp_remove() badly. This patch restores the former for loop behavior, while still fixing the sparse warnings. Cc: <stable@vger.kernel.org> Reviewed-by: Gary Leshner <gary.s.leshner@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Drivers: infinband: remove __dev* attributes.Greg Kroah-Hartman2013-01-031-7/+5
| | | | | | | | | | | | | | | | | | | | | | | CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Tom Tucker <tom@opengridcomputing.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: kill vma flag VM_RESERVED and mm->reserved_vm counterKonstantin Khlebnikov2012-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2012-10-024-5/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes" This merge also removes a new use of __cancel_delayed_work(), and replaces it with the regular cancel_delayed_work() that is now irq-safe thanks to the workqueue updates. That said, I suspect the sequence in question should probably use "mod_delayed_work()". I just did the minimal "don't use deprecated functions" fixup, though. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits) IB/qib: Fix local access validation for user MRs mlx4_core: Disable SENSE_PORT for multifunction devices mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs mlx4_core: Stash PCI ID driver_data in mlx4_priv structure IB/srp: Avoid having aborted requests hang IB/srp: Fix use-after-free in srp_reset_req() IB/qib: Add a qib driver version RDMA/nes: Fix compilation error when nes_debug is enabled RDMA/nes: Print hardware resource type RDMA/nes: Fix for crash when TX checksum offload is off RDMA/nes: Cosmetic changes RDMA/nes: Fix for incorrect MSS when TSO is on RDMA/nes: Fix incorrect resolving of the loopback MAC address mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem mlx4_core: Trivial cleanups to driver log messages mlx4_core: Trivial readability fix: "0X30" -> "0x30" IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations mlx4: Paravirtualize Node Guids for slaves mlx4: Activate SR-IOV mode for IB ...
| * IB/qib: Fix local access validation for user MRsMike Marciniszyn2012-10-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8aac4cc3a9d7 ("IB/qib: RCU locking for MR validation") introduced a bug that broke user post sends. The proper validation of the MR was lost in the patch. This patch corrects that validation. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/qib: Add a qib driver versionDean Luick2012-09-303-3/+16
| | | | | | | | | | | | Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-021-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
| * | userns: Convert ipathfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GIDEric W. Biederman2012-09-211-2/+2
| | | | | | | | | | | | | | | | | | Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | | Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds2012-10-012-25/+17
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)" * tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits) PCI: acpiphp: Handle PCIe ports without native hotplug capability PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots PCI/ACPI: Protect acpi_pci_roots list with mutex PCI/ACPI: Use acpi_pci_root info rather than looking it up again PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface PCI/ACPI: Protect acpi_pci_drivers list with mutex PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges PCI/ACPI: Use normal list for struct acpi_pci_driver PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots PCI: Fix default vga ref_count ia64/PCI: Clear host bridge aperture struct resource x86/PCI: Clear host bridge aperture struct resource PCI: Stop all children first, before removing all children Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()" PCI: Provide a default pcibios_update_irq() PCI: Discard __init annotations for pci_fixup_irqs() and related functions PCI: Use correct type when freeing bus resource list PCI: Check P2P bridge for invalid secondary/subordinate range PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot() ...
| * | Merge commit 'v3.6-rc5' into nextBjorn Helgaas2012-09-132-2/+4
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'v3.6-rc5': (1098 commits) Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync powerpc: Update DSCR on all CPUs when writing sysfs dscr_default powerpc/powernv: Always go into nap mode when CPU is offline powerpc: Give hypervisor decrementer interrupts their own handler powerpc/vphn: Fix arch_update_cpu_topology() return value ARM: gemini: fix the gemini build ... Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/rapidio/devices/tsi721.c
| * \ \ Merge branch 'pci/stephen-const' into nextBjorn Helgaas2012-09-122-2/+2
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pci/stephen-const: make drivers with pci error handlers const scsi: make pci error handlers const netdev: make pci_error_handlers const PCI: Make pci_error_handlers const
| | * | | make drivers with pci error handlers constStephen Hemminger2012-09-072-2/+2
| | | |/ | | |/| | | | | | | | | | | | | | | | | Covers the rest of the uses of pci error handler. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | | IB/qib: Use PCI Express Capability accessorsJiang Liu2012-08-231-23/+15
| |/ / | | | | | | | | | | | | | | | | | | | | | Use PCI Express Capability access functions to simplify qib driver. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
* | | IB/qib: Fix failure of compliance test C14-024#06_LocalPortNumMike Marciniszyn2012-09-141-1/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | Commit 3236b2d469db ("IB/qib: MADs with misset M_Keys should return failure") introduced a return code assignment that unfortunately introduced an unconditional exit for the routine due to missing braces. This patch adds the braces to correct the original patch. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| |
| \
*-. \ Merge branches 'cma', 'ipoib', 'misc', 'mlx4', 'ocrdma', 'qib' and 'srp' ↵Roland Dreier2012-08-162-2/+4
|\ \ \ | |_|/ |/| | | | | into for-next
| | * IB/qib: Fix error return code in qib_init_7322_variables()Julia Lawall2012-08-151-1/+3
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert a 0 error return code to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e,e1,e2,e3,e4,x; @@ ( if (\(ret != 0\|ret < 0\) || ...) { ... return ...; } | ret = 0 ) ... when != ret = e1 *x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...); ... when != x = e2 when != ret = e3 *if (x == NULL || ...) { ... when != ret = e4 * return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB: Fix typos in infiniband driversMasanari Iida2012-08-151-1/+1
|/ | | | | | | Correct spelling typos in comments in drivers/infiniband. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix size of cc_supported_table_entriesMike Marciniszyn2012-07-291-5/+5
| | | | | | | | | | | | | Commit 36a8f01cd24b ("IB/qib: Add congestion control agent implementation") tries to store the value 1984 in a u8, which leads to truncation. Fix this by making the member big enough. This bug was detected by a smatch warning. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: checkpatch fixesMike Marciniszyn2012-07-1916-380/+394
| | | | | | | | | | | | | Elminate some simple_strto* usage. checkpatch also noted pr_ conversations, which have been done as recommended. The pr_fmt() define is used to shorten line length. Other multi-line string warnings are also elmininated. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Add congestion control agent implementationMike Marciniszyn2012-07-195-12/+790
| | | | | | | | | Add a congestion control agent in the driver that handles gets and sets from the congestion control manager in the fabric for the Performance Scale Messaging (PSM) library. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Reduce sdma_lock contentionMike Marciniszyn2012-07-194-12/+85
| | | | | | | | | | | | | | | | | | | | | | | | | Profiling has shown that sdma_lock is proving a bottleneck for performance. The situations include: - RDMA reads when krcvqs > 1 - post sends from multiple threads For RDMA read the current global qib_wq mechanism runs on all CPUs and contends for the sdma_lock when multiple RMDA read requests are fielded on differenct CPUs. For post sends, the direct call to qib_do_send() from multiple threads causes the contention. Since the sdma mechanism is per port, this fix converts the existing workqueue to a per port single thread workqueue to reduce the lock contention in the RDMA read case, and for any other case where the QP is scheduled via the workqueue mechanism from more than 1 CPU. For the post send case, This patch modifies the post send code to test for a non empty sdma engine. If the sdma is not idle the (now single thread) workqueue will be used to trigger the send engine instead of the direct call to qib_do_send(). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix an incorrect log messageBetty Dall2012-07-191-1/+1
| | | | | | | | | There is a cut-and-paste typo in the function qib_pci_slot_reset() where it prints that the "link_reset" function is called rather than the "slot_reset" function. This makes the message misleading. Signed-off-by: Betty Dall <betty.dall@hp.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix QP RCU sparse warningsMike Marciniszyn2012-07-175-35/+61
| | | | | | | | | | Commit af061a644a0e ("IB/qib: Use RCU for qpn lookup") introduced sparse warnings. This patch corrects those issues. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix sparse RCU warnings in qib_keys.cMike Marciniszyn2012-07-102-3/+5
| | | | | | | | | Commit 8aac4cc3a9d7 ("IB/qib: RCU locking for MR validation") introduced new sparse warnings in qib_keys.c. Acked-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: RCU locking for MR validationMike Marciniszyn2012-07-084-50/+66
| | | | | | | | | | | | Profiling indicates that MR validation locking is expensive. The MR table is largely read-only and is a suitable candidate for RCU locking. The patch uses RCU locking during validation to eliminate one lock/unlock during that validation. Reviewed-by: Mike Heinz <michael.william.heinz@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Avoid returning EBUSY from MR deregisterMike Marciniszyn2012-07-089-224/+244
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A timing issue can occur where qib_mr_dereg can return -EBUSY if the MR use count is not zero. This can occur if the MR is de-registered while RDMA read response packets are being progressed from the SDMA ring. The suspicion is that the peer sent an RDMA read request, which has already been copied across to the peer. The peer sees the completion of his request and then communicates to the responder that the MR is not needed any longer. The responder tries to de-register the MR, catching some responses remaining in the SDMA ring holding the MR use count. The code now uses a get/put paradigm to track MR use counts and coordinates with the MR de-registration process using a completion when the count has reached zero. A timeout on the delay is in place to catch other EBUSY issues. The reference count protocol is as follows: - The return to the user counts as 1 - A reference from the lk_table or the qib_ibdev counts as 1. - Transient I/O operations increase/decrease as necessary A lot of code duplication has been folded into the new routines init_qib_mregion() and deinit_qib_mregion(). Additionally, explicit initialization of fields to zero is now handled by kzalloc(). Also, duplicated code 'while.*num_sge' that decrements reference counts have been consolidated in qib_put_ss(). Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix UC MR refs for immediate operationsMike Marciniszyn2012-07-081-1/+7
| | | | | | | | | | | | | | An MR reference leak exists when handling UC RDMA writes with immediate data because we manipulate the reference counts as if the operation had been a send. This patch moves the last_imm label so that the RDMA write operations with immediate data converge at the cq building code. The copy/mr deref code is now done correctly prior to the branch to last_imm. Reviewed-by: Edward Mascarenhas <edward.mascarenhas@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Add cache line awareness to qib_qp and qib_devdata structuresMike Marciniszyn2012-05-147-94/+120
| | | | | | | | | | | | | | This patch reorganizes the QP and devdata files to be more cache line aware. qib_qp fields in particular are split into read-mostly, send, and receive fields. qib_devdata fields are split into read-mostly and read/write fields Testing has show that bidirectional tests improve by as much as 100% with this patch. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: MADs with misset M_Keys should return failureJim Foraker2012-05-141-1/+3
| | | | | | | | | | | | If a MAD is sent directly to the local HCA rather than placed on a QP and the MAD fails M_Key checks, there is no means to generate a timeout for the client, which may hang. Instead we report IB_MAD_RESULT_FAILURE, which operates the same for on-the-wire packets, but will generate a send failure back to the client. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix M_Key lease timeout handlingJim Foraker2012-05-141-15/+29
| | | | | | | | | | | | | | | | | | | | | | | | | If a port has an M_Key lease set, the M_Key protect bits set to 1, and a SubnSet arrives with an invalid M_Key, an M_Key mismatch trap is generated, the lease timer begins as expected, and eventually the M_Key protect bits will be set back to 0 as per the spec. However, if any other SMP with an invalid M_Key arrives, the lease timer is expired and the M_Key protect bits remain in force. This is not according to to spec. In particular, C14-17 says that a lease timer that is underway is not affected by protection level checks (ie, at protection level 1, a SubnGet with a bad M_Key may be successful, but does not stop the timer), and C14-19 says that the timer shall stop when a valid M_Key has been received. C14-19 is the only compliance statement that specifies a stopping condition for the timer. This behavior is magnified if the port's Master SM LID attribute points at itself. In that case, the M_Key mismatch trap is sufficient to expire the timer, and the mkey lease attribute is rendered useless. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Fix QLE734X link cyclingMitko Haralanov2012-05-141-1/+1
| | | | | | | | | | The SERDES was using the incorrect Frequency Loop Bandwidth setting causing the link to cycle through the Physical link negotiation state machine. Fixing the Frequency Loop Bandwidth setting in the SERDES helps the link come up faster and more reliably. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Display correct value for number of contextsMitko Haralanov2012-05-142-3/+7
| | | | | | | | | | | A "fix" for a bug with the number of contexts on a single-port board caused the calculation to be off by one, which causes problems with the upper layers. The same problem exists for number of free contexts, which is also fixed here. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Correct ordering of reregister vs. port active eventsTodd Rimmer2012-05-141-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | When a port first goes active with SMA Set(PortInfo) and reregister bit set, the driver sends up the reregister event followed by a port active event. The problem is that in response to reregister event most apps try to issue a SA query of some sort, but that fails because port is not active. The qib driver needs to a trivial change to correct this behavior. This issue has been there for a while; however the recent serdes work has probably made the delay between the reregister event and the active event larger and hence opened the race far enough so that its being seen more often. The patch also changes the clientrereg local to a u8 and saves off the rereg bit into it. The code following the nested subn_get_portinfo() now restores that bit per o14-12.2.1 with a logical OR from that copy. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Optimize pio ack buffer allocationMike Marciniszyn2012-05-145-9/+28
| | | | | | | | | | | | | | | | | | | | | This patch optimizes pio buffer allocation in the kernel. For qib, kernel pio buffers are used for sending acks. The code to allocate the buffer would always start at 0 until it found a buffer. This means that an average of 64 comparisions were done on each allocate, since the busy bit won't be cleared until the bits are refreshed when buffers are exhausted. This patch adds two new fields in the devdata struct, last_pio and min_kernel_pio. last_pio is the last buffer that was allocated. min_kernel_pio is the lowest potential available buffer. min_kernel_pio is modifed as contexts are allocated and deallocted. Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Add prefetch for eager buffersMike Marciniszyn2012-05-141-1/+4
| | | | | | | | Add a prefetch call when a packet has been stored. The nature of the prefetch is correctly determined by the alternatives mechanism. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
*-. Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', ↵Roland Dreier2012-03-194-37/+105
|\ \ | | | | | | | | | 'srp' and 'srpt' into for-next
| | * IB/qib: Avoid filtering LID on SMA portinfoMike Marciniszyn2012-02-251-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current get portinfo handling filters the LID being sent, changing zero to 0xffff. This causes OpenSM to log excessive warning messages. Reviewed-by: Edward Mascarenhas <edward.mascarenhas@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/qib: Add logic for affinity hintMike Marciniszyn2012-02-253-34/+104
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call irq_set_affinity_hint() to give userspace programs such as irqbalance the information to be able to distribute qib interrupts appropriately. The logic allocates all non-receive interrupts to the first CPU local to the HCA. Receive interrupts are allocated round robin starting with the second CPU local to the HCA with potential wrap back to the second CPU. This patch also adds a refinement to the name registered for MSI-X interrupts so that user level scripts can determine the device associated with the IRQs when there are multiple HCAs with a potentially different set of local CPUs. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | IB: Change CQE "csum_ok" field to a bit flagOr Gerlitz2012-03-082-2/+0
|/ | | | | | | | | | | | | | | | | Use a bit in wc_flags rather then a whole integer to hold the "checksum OK" flag. By itself, this change doesn't reduce the size of struct ib_wc on 64bit machines -- it stays on 56 bytes because of padding. However, it will allow to add more fields in the future without enlarging the struct. Also, it will let us have a unified approach with future libibverbs checksum offload reporting, because a bit flag doesn't break the library ABI. This patch was suggested during conversation with Liran Liss <liranl@mellanox.com>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Roll back PCIe tuning changeMike Marciniszyn2012-01-271-1/+1
| | | | | | | | | | | | | | | | | | Commit 8d4548f2b ("IB/qib: Default some module parameters optimally") introduced an issue with older root complexes. They cannot handle the pcie_caps of 0x51 (MaxReadReq 4096, MaxPayload=256). A typical diagnostic in this situation reported by syslog contains the text: [PCIe Poisoned TLP][Send DMA memory read] Restore the module paramter default to zero with will avoid any changes in the root complex. Reviewed-by: Mark Debbage <mark.debbage@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/qib: Use GFP_ATOMIC when locks are heldJulia Lawall2012-01-271-1/+1
| | | | | | | | | | | | alloc_dummy_hdrq() is called with locks held and thus should not use GFP_KERNEL. The semantic patch that makes this report is available in scripts/coccinelle/locks/call_kern.cocci. Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge tag 'infiniband-for-linus' of ↵Linus Torvalds2012-01-0811-44/+70
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband infiniband changes for 3.3 merge window * tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: rdma/core: Fix sparse warnings RDMA/cma: Fix endianness bugs RDMA/nes: Fix terminate during AE RDMA/nes: Make unnecessarily global nes_set_pau() static RDMA/nes: Change MDIO bus clock to 2.5MHz IB/cm: Fix layout of APR message IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE IB/qib: Default some module parameters optimally IB/qib: Optimize locking for get_txreq() IB/qib: Fix a possible data corruption when receiving packets IB/qib: Eliminate 64-bit jiffies use IB/qib: Fix style issues IB/uverbs: Protect QP multicast list
| * IB/qib: Default some module parameters optimallyMike Marciniszyn2012-01-032-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minimize the need for users to have to set module parameters to get good performance. The following two parameters are changed: - rcvhdrcnt to twice the rcvegrcnt - pcie_caps=0x51 The rcvhdrcnt at twice the egrcount allows the preemptive NAK code during reception to function in 100% of the cases rather than a sender jiffies-based timeout. The pcie_caps default of 0x51 will set the proposed MaxPayload and MaxReceiveReqest to 256 and 4096 respectively. The capabilities on the root complex will be used to limit those values. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/qib: Optimize locking for get_txreq()Mike Marciniszyn2012-01-031-10/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code locks the QP s_lock, followed by the pending_lock, I guess to to protect against the allocate failing. This patch only locks the pending_lock, assuming that the empty case is an exeception, in which case the pending_lock is dropped, and the original code is executed. This will save a lock of s_lock in the normal case. The observation is that the sdma descriptors will deplete at twice the rate of txreq's, so this should be rare. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/qib: Fix a possible data corruption when receiving packetsRam Vepa2012-01-033-4/+10
| | | | | | | | | | | | | | | | | | | | | | Prevent a receive data corruption by ensuring that the write to update the rcvhdrheadn register to generate an interrupt is at the very end of the receive processing. Signed-off-by: Ramkrishna Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
OpenPOWER on IntegriCloud