summaryrefslogtreecommitdiffstats
path: root/drivers/xen
Commit message (Collapse)AuthorAgeFilesLines
* xen/events: don't bind non-percpu VIRQs with percpu chipDavid Vrabel2015-05-191-4/+8
| | | | | | | | | | | | | | | | | | | | | | A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different VCPU than it is bound to. This can result in a race between handle_percpu_irq() and removing the action in __free_irq() because handle_percpu_irq() does not take desc->lock. The interrupt handler sees a NULL action and oopses. Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER). # cat /proc/interrupts | grep virq 40: 87246 0 xen-percpu-virq timer0 44: 0 0 xen-percpu-virq debug0 47: 0 20995 xen-percpu-virq timer1 51: 0 0 xen-percpu-virq debug1 69: 0 0 xen-dyn-virq xen-pcpu 74: 0 0 xen-dyn-virq mce 75: 29 0 xen-dyn-virq hvc_console Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org>
* xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARMStefano Stabellini2015-05-061-1/+1
| | | | | | | | | | | | | | | | | Make sure that xen_swiotlb_init allocates buffers that are DMA capable when at least one memblock is available below 4G. Otherwise we assume that all devices on the SoC can cope with >4G addresses. We do this on ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case. No functional changes on x86. From: Chen Baozi <baozich@gmail.com> Signed-off-by: Chen Baozi <baozich@gmail.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Chen Baozi <baozich@gmail.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* xen/events: Set irq_info->evtchn before binding the channel to CPU in ↵Boris Ostrovsky2015-05-051-1/+1
| | | | | | | | | | | | __startup_pirq() .. because bind_evtchn_to_cpu(evtchn, cpu) will map evtchn to 'info' and pass 'info' down to xen_evtchn_port_bind_to_cpu(). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Annie Li <annie.li@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* xen/xenbus: Update xenbus event channel on resumeBoris Ostrovsky2015-05-051-0/+29
| | | | | | | | | After a resume the hypervisor/tools may change xenbus event channel number. We should re-query it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* xen/events: Clear cpu_evtchn_mask before resumingBoris Ostrovsky2015-05-052-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a guest is resumed, the hypervisor may change event channel assignments. If this happens and the guest uses 2-level events it is possible for the interrupt to be claimed by wrong VCPU since cpu_evtchn_mask bits may be stale. This can happen even though evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that is passed in is not necessarily the original one (from pre-migration times) but instead is freshly allocated during resume and so any information about which CPU the channel was bound to is lost. Thus we should clear the mask during resume. We also need to make sure that bits for xenstore and console channels are set when these two subsystems are resumed. While rebind_evtchn_irq() (which is invoked for both of them on a resume) calls irq_set_affinity(), the latter will in fact postpone setting affinity until handling the interrupt. But because cpu_evtchn_mask will have bits for these two cleared we won't be able to take the interrupt. With that in mind, we need to bind those two channels explicitly in rebind_evtchn_irq(). We will keep irq_set_affinity() so that we have a pass through generic irq affinity code later, in case something needs to be updated there as well. (Also replace cpumask_of(0) with cpumask_of(info->cpu) in rebind_evtchn_irq(): it should be set to zero in preceding xen_irq_info_evtchn_setup().) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Annie Li <annie.li@oracle.com> Cc: <stable@vger.kernel.org> # 3.14+ Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* xen-pciback: Add name prefix to global 'permissive' variableBen Hutchings2015-04-293-5/+5
| | | | | | | | | | | The variable for the 'permissive' module parameter used to be static but was recently changed to be extern. This puts it in the kernel global namespace if the driver is built-in, so its name should begin with a prefix identifying the driver. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: af6fc858a35b ("xen-pciback: limit guest control of command register") Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* xen: Suspend ticks on all CPUs during suspendBoris Ostrovsky2015-04-291-3/+6
| | | | | | | | | | | | | | | | | | | | | Commit 77e32c89a711 ("clockevents: Manage device's state separately for the core") decouples clockevent device's modes from states. With this change when a Xen guest tries to resume, it won't be calling its set_mode op which needs to be done on each VCPU in order to make the hypervisor aware that we are in oneshot mode. This happens because clockevents_tick_resume() (which is an intermediate step of resuming ticks on a processor) doesn't call clockevents_set_state() anymore and because during suspend clockevent devices on all VCPUs (except for the one doing the suspend) are left in ONESHOT state. As result, during resume the clockevents state machine will assume that device is already where it should be and doesn't need to be updated. To avoid this problem we should suspend ticks on all VCPUs during suspend. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* xen/grant: introduce func gnttab_unmap_refs_sync()Bob Liu2015-04-272-25/+31
| | | | | | | | | | | There are several place using gnttab async unmap and wait for completion, so move the common code to a function gnttab_unmap_refs_sync(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* Merge branch 'for-next' of ↵Linus Torvalds2015-04-241-61/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Lots of activity in target land the last months. The highlights include: - Convert fabric drivers tree-wide to target_register_template() (hch + bart) - iser-target hardening fixes + v1.0 improvements (sagi) - Convert iscsi_thread_set usage to kthread.h + kill iscsi_target_tq.c (sagi + nab) - Add support for T10-PI WRITE_STRIP + READ_INSERT operation (mkp + sagi + nab) - DIF fixes for CONFIG_DEBUG_SG=y + UNMAP file emulation (akinobu + sagi + mkp) - Extended TCMU ABI v2 for future BIDI + DIF support (andy + ilias) - Fix COMPARE_AND_WRITE handling for NO_ALLLOC drivers (hch + nab) Thanks to everyone who contributed this round with new features, bug-reports, fixes, cleanups and improvements. Looking forward, it's currently shaping up to be a busy v4.2 as well" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (69 commits) target: Put TCMU under a new config option target: Version 2 of TCMU ABI target: fix tcm_mod_builder.py target/file: Fix UNMAP with DIF protection support target/file: Fix SG table for prot_buf initialization target/file: Fix BUG() when CONFIG_DEBUG_SG=y and DIF protection enabled target: Make core_tmr_abort_task() skip TMFs target/sbc: Update sbc_dif_generate pr_debug output target/sbc: Make internal DIF emulation honor ->prot_checks target/sbc: Return INVALID_CDB_FIELD if DIF + sess_prot_type disabled target: Ensure sess_prot_type is saved across session restart target/rd: Don't pass incomplete scatterlist entries to sbc_dif_verify_* target: Remove the unused flag SCF_ACK_KREF target: Fix two sparse warnings target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handling target: simplify the target template registration API target: simplify target_xcopy_init_pt_lun target: remove the unused SCF_CMD_XCOPY_PASSTHROUGH flag target/rd: reduce code duplication in rd_execute_rw() tcm_loop: fixup tpgt string to integer conversion ...
| * target: simplify the target template registration APIChristoph Hellwig2015-04-141-61/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of calling target_fabric_configfs_init() + target_fabric_configfs_register() / target_fabric_configfs_deregister() target_fabric_configfs_free() from every target driver, rewrite the API so that we have simple register/unregister functions that operate on a const operations vector. This patch also fixes a memory leak in several target drivers. Several target drivers namely called target_fabric_configfs_deregister() without calling target_fabric_configfs_free(). A large part of this patch is based on earlier changes from Bart Van Assche <bart.vanassche@sandisk.com>. (v2: Add a new TF_CIT_SETUP_DRV macro so that the core configfs code can declare attributes as either core only or for drivers) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* | Merge tag 'arm64-upstream' of ↵Linus Torvalds2015-04-242-1/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull initial ACPI support for arm64 from Will Deacon: "This series introduces preliminary ACPI 5.1 support to the arm64 kernel using the "hardware reduced" profile. We don't support any peripherals yet, so it's fairly limited in scope: - MEMORY init (UEFI) - ACPI discovery (RSDP via UEFI) - CPU init (FADT) - GIC init (MADT) - SMP boot (MADT + PSCI) - ACPI Kconfig options (dependent on EXPERT) ACPI for arm64 has been in development for a while now and hardware has been available that can boot with either FDT or ACPI tables. This has been made possible by both changes to the ACPI spec to cater for ARM-based machines (known as "hardware-reduced" in ACPI parlance) but also a Linaro-driven effort to get this supported on top of the Linux kernel. This pull request is the result of that work. These changes allow us to initialise the CPUs, interrupt controller, and timers via ACPI tables, with memory information and cmdline coming from EFI. We don't support a hybrid ACPI/FDT scheme. Of course, there is still plenty of work to do (a serial console would be nice!) but I expect that to happen on a per-driver basis after this core series has been merged. Anyway, the diff stat here is fairly horrible, but splitting this up and merging it via all the different subsystems would have been extremely painful. Instead, we've got all the relevant Acks in place and I've not seen anything other than trivial (Kconfig) conflicts in -next (for completeness, I've included my resolution below). Nearly half of the insertions fall under Documentation/. So, we'll see how this goes. Right now, it all depends on EXPERT and I fully expect people to use FDT by default for the immediate future" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (31 commits) ARM64 / ACPI: make acpi_map_gic_cpu_interface() as void function ARM64 / ACPI: Ignore the return error value of acpi_map_gic_cpu_interface() ARM64 / ACPI: fix usage of acpi_map_gic_cpu_interface ARM64: kernel: acpi: honour acpi=force command line parameter ARM64: kernel: acpi: refactor ACPI tables init and checks ARM64: kernel: psci: let ACPI probe PSCI version ARM64: kernel: psci: factor out probe function ACPI: move arm64 GSI IRQ model to generic GSI IRQ layer ARM64 / ACPI: Don't unflatten device tree if acpi=force is passed ARM64 / ACPI: additions of ACPI documentation for arm64 Documentation: ACPI for ARM64 ARM64 / ACPI: Enable ARM64 in Kconfig XEN / ACPI: Make XEN ACPI depend on X86 ARM64 / ACPI: Select ACPI_REDUCED_HARDWARE_ONLY if ACPI is enabled on ARM64 clocksource / arch_timer: Parse GTDT to initialize arch timer irqchip: Add GICv2 specific ACPI boot support ARM64 / ACPI: Introduce ACPI_IRQ_MODEL_GIC and register device's gsi ACPI / processor: Make it possible to get CPU hardware ID via GICC ACPI / processor: Introduce phys_cpuid_t for CPU hardware ID ARM64 / ACPI: Parse MADT for SMP initialization ...
| * | XEN / ACPI: Make XEN ACPI depend on X86Hanjun Guo2015-03-262-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ACPI is enabled on ARM64, XEN ACPI will also compiled into the kernel, but XEN ACPI is x86 dependent, so introduce CONFIG_XEN_ACPI to make it depend on x86 before XEN ACPI is functional on ARM64. CC: Julien Grall <julien.grall@linaro.org> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | | Merge tag 'stable/for-linus-4.1-rc0-tag' of ↵Linus Torvalds2015-04-1613-235/+665
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - use a single source list of hypercalls, generating other tables etc. at build time. - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests. - significant performance improve to guest save/restore/migration. - scsiback/front save/restore support. - infrastructure for multi-page xenbus rings. - misc fixes. * tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Try harder to get PXM information for Xen xenbus_client: Extend interface to support multi-page ring xen-pciback: also support disabling of bus-mastering and memory-write-invalidate xen: support suspend/resume in pvscsi frontend xen: scsiback: add LUN of restored domain xen-scsiback: define a pr_fmt macro with xen-pvscsi xen/mce: fix up xen_late_init_mcelog() error handling xen/privcmd: improve performance of MMAPBATCH_V2 xen: unify foreign GFN map/unmap for auto-xlated physmap guests x86/xen/apic: WARN with details. x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs xen/pciback: Don't print scary messages when unsupported by hypervisor. xen: use generated hypercall symbols in arch/x86/xen/xen-head.S xen: use generated hypervisor symbols in arch/x86/xen/trace.c xen: synchronize include/xen/interface/xen.h with xen xen: build infrastructure for generating hypercall depending symbols xen: balloon: Use static attribute groups for sysfs entries xen: pcpu: Use static attribute groups for sysfs entry
| * | | xen/pci: Try harder to get PXM information for XenRoss Lagerwall2015-04-151-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device being added to Xen is not contained in the ACPI table, walk the PCI device tree to find a parent that is contained in the ACPI table before finding the PXM information from this device. Previously, it would try to get a handle for the device, then the device's bridge, then the physfn. This changes the order so that it tries to get a handle for the device, then the physfn, the walks up the PCI device tree. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xenbus_client: Extend interface to support multi-page ringWei Liu2015-04-153-103/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally Xen PV drivers only use single-page ring to pass along information. This might limit the throughput between frontend and backend. The patch extends Xenbus driver to support multi-page ring, which in general should improve throughput if ring is the bottleneck. Changes to various frontend / backend to adapt to the new interface are also included. Affected Xen drivers: * blkfront/back * netfront/back * pcifront/back * scsifront/back * vtpmfront The interface is documented, as before, in xenbus_client.c. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen-pciback: also support disabling of bus-mastering and memory-write-invalidateJan Beulich2015-03-161-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not clear to me why only the enabling operation got handled so far. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen: scsiback: add LUN of restored domainJuergen Gross2015-03-161-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a xen domain is being restored the LUN state of a pvscsi device is "Connected" and not "Initialising" as in case of attaching a new pvscsi LUN. This must be taken into account when adding a new pvscsi device for a domain as otherwise the pvscsi LUN won't be connected to the SCSI target associated with it. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen-scsiback: define a pr_fmt macro with xen-pvscsiTao Chen2015-03-161-39/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the {xen-pvscsi: } prefix in pr_fmt and remove DPRINTK, then replace all DPRINTK with pr_debug. Also fixed up some comments just as eliminate redundant whitespace and format the code. These will make the code easier to read. Signed-off-by: Tao Chen <boby.chen@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen/mce: fix up xen_late_init_mcelog() error handlingDan Carpenter2015-03-161-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Static checkers complain about the missing call to misc_deregister() if bind_virq_for_mce() fails. Also I reversed the tests so that we do error handling instead of success handling. That way we just have a series of function calls instead of the more complicated nested if statements in the original code. Let's preserve the error codes as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen/privcmd: improve performance of MMAPBATCH_V2David Vrabel2015-03-162-53/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map multiple frames at a time rather than one at a time, despite the pages being non-consecutive GFNs. xen_remap_foreign_mfn_array() is added which maps an array of GFNs (instead of a consecutive range of GFNs). Since per-frame errors are returned in an array, privcmd must set the MMAPBATCH_V1 error bits as part of the "report errors" phase, after all the frames are mapped. Migrate times are significantly improved (when using a PV toolstack domain). For example, for an idle 12 GiB PV guest: Before After real 0m38.179s 0m26.868s user 0m15.096s 0m13.652s sys 0m28.988s 0m18.732s Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
| * | | xen: unify foreign GFN map/unmap for auto-xlated physmap guestsDavid Vrabel2015-03-163-0/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and unmap foreign GFNs using the same method (updating the physmap). Unify the two arm and x86 implementations into one commont one. Note that on arm and arm64, the correct error code will be returned (instead of always -EFAULT) and map/unmap failure warnings are no longer printed. These changes are required if the foreign domain is paging (-ENOENT failures are expected and must be propagated up to the caller). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
| * | | xen/pciback: Don't print scary messages when unsupported by hypervisor.Konrad Rzeszutek Wilk2015-03-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We print at the warninig level messages such as: pciback 0000:90:00.5: MSI-X preparation failed (-38) which is due to the hypervisor not supporting this sub-hypercall (which was added in Xen 4.3). Instead of having scary messages all the time - only have it when the hypercall is actually supported. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen: balloon: Use static attribute groups for sysfs entriesTakashi Iwai2015-03-161-25/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of manual calls of device_create_file(), device_remove_file() and sysfs_create_group(), assign static attribute groups to the device to register. This simplifies the code and avoids possible races. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | xen: pcpu: Use static attribute groups for sysfs entryTakashi Iwai2015-03-161-16/+28
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of manual calls of device_create_file() and device_remove_file(), assign the static attribute groups to the device to register. The conditional build of sysfs is done in is_visible callback instead. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | | cleancache: forbid overriding cleancache_opsVladimir Davydov2015-04-141-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cleancache_register_ops returns the previous value of cleancache_ops to allow chaining. However, chaining, as it is implemented now, is extremely dangerous due to possible pool id collisions. Suppose, a new cleancache driver is registered after the previous one assigned an id to a super block. If the new driver assigns the same id to another super block, which is perfectly possible, we will have two different filesystems using the same id. No matter if the new driver implements chaining or not, we are likely to get data corruption with such a configuration eventually. This patch therefore disables the ability to override cleancache_ops altogether as potentially dangerous. If there is already cleancache driver registered, all further calls to cleancache_register_ops will return EBUSY. Since no user of cleancache implements chaining, we only need to make minor changes to the code outside the cleancache core. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Bob Liu <lliubbo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'stable/for-linus-4.0-rc6-tag' of ↵Linus Torvalds2015-04-022-0/+40
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen regression fixes from David Vrabel: "Fix two regressions in the balloon driver's use of memory hotplug when used in a PV guest" * tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: before adding hotplugged memory, set frames to invalid x86/xen: prepare p2m list for memory hotplug
| * | | xen/balloon: before adding hotplugged memory, set frames to invalidJuergen Gross2015-03-231-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set regions above the end of RAM as 1:1") introduced a regression. To be able to add memory pages which were added via memory hotplug to a pv domain, the pages must be "invalid" instead of "identity" in the p2m list before they can be added. Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> # 3.16+ Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
| * | | x86/xen: prepare p2m list for memory hotplugJuergen Gross2015-03-231-0/+17
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear virtual mapped sparse p2m list") introduced a regression regarding to memory hotplug for a pv-domain: as the virtual space for the p2m list is allocated for the to be expected memory size of the domain only, hotplugged memory above that size will not be usable by the domain. Correct this by using a configurable size for the p2m list in case of memory hotplug enabled (default supported memory size is 512 GB for 64 bit domains and 4 GB for 32 bit domains). Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> # 3.19+ Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds2015-03-211-5/+2
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI target fixes from Nicholas Bellinger: "Here are current target-pending fixes for v4.0-rc5 code that have made their way into the queue over the last weeks. The fixes this round include: - Fix long-standing iser-target logout bug related to early conn_logout_comp completion, resulting in iscsi_conn use-after-tree OOpsen. (Sagi + nab) - Fix long-standing tcm_fc bug in ft_invl_hw_context() failure handing for DDP hw offload. (DanC) - Fix incorrect use of unprotected __transport_register_session() in tcm_qla2xxx + other single local se_node_acl fabrics. (Bart) - Fix reference leak in target_submit_cmd() -> target_get_sess_cmd() for ack_kref=1 failure path. (Bart) - Fix pSCSI backend ->get_device_type() statistics OOPs with un-configured device. (Olaf + nab) - Fix virtual LUN=0 target_configure_device failure OOPs at modprobe time. (Claudio + nab) - Fix FUA write false positive failure regression in v4.0-rc1 code. (Christophe Vu-Brugier + HCH)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0 target: Fix virtual LUN=0 target_configure_device failure OOPs target/pscsi: Fix NULL pointer dereference in get_device_type tcm_fc: missing curly braces in ft_invl_hw_context() target: Fix reference leak in target_get_sess_cmd() error path loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session tcm_qla2xxx: Fix incorrect use of __transport_register_session iscsi-target: Avoid early conn_logout_comp for iser connections Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target" target: Disallow changing of WRITE cache/FUA attrs after export
| * | loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_sessionBart Van Assche2015-03-191-5/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes loopback, usb-gadget, vhost-scsi and xen-scsiback fabric code to invoke transport_register_session() instead of the unprotected flavour, to ensure se_tpg->session_lock is taken when adding new session list nodes to se_tpg->tpg_sess_list. Note that since these four fabric drivers already hold their own internal TPG mutexes when accessing se_tpg->tpg_sess_list, and consist of a single se_session created through configfs attribute access, no list corruption can currently occur. So for correctness sake, go ahead and use the se_tpg->session_lock protected version for these four fabric drivers. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* | xen-pciback: limit guest control of command registerJan Beulich2015-03-113-14/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise the guest can abuse that control to cause e.g. PCIe Unsupported Request responses by disabling memory and/or I/O decoding and subsequently causing (CPU side) accesses to the respective address ranges, which (depending on system configuration) may be fatal to the host. Note that to alter any of the bits collected together as PCI_COMMAND_GUEST permissive mode is now required to be enabled globally or on the specific device. This is CVE-2015-2150 / XSA-120. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen/events: avoid NULL pointer dereference in dom0 on large machinesJuergen Gross2015-03-061-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the pvops kernel a NULL pointer dereference was detected on a large machine (144 processors) when booting as dom0 in evtchn_fifo_unmask() during assignment of a pirq. The event channel in question was the first to need a new entry in event_array[] in events_fifo.c. Unfortunately xen_irq_info_pirq_setup() is called with evtchn being 0 for a new pirq and the real event channel number is assigned to the pirq only during __startup_pirq(). It is mandatory to call xen_evtchn_port_setup() after assigning the event channel number to the pirq to make sure all memory needed for the event channel is allocated. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> # 3.14+ Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen-scsiback: mark pvscsi frontend request consumed only after last readJuergen Gross2015-02-231-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A request in the ring buffer mustn't be read after it has been marked as consumed. Otherwise it might already have been reused by the frontend without violating the ring protocol. To avoid inconsistencies in the backend only work on a private copy of the request. This will ensure a malicious guest not being able to bypass consistency checks of the backend by modifying an active request. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | x86/xen: allow privcmd hypercalls to be preemptedDavid Vrabel2015-02-233-1/+47
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hypercalls submitted by user space tools via the privcmd driver can take a long time (potentially many 10s of seconds) if the hypercall has many sub-operations. A fully preemptible kernel may deschedule such as task in any upcall called from a hypercall continuation. However, in a kernel with voluntary or no preemption, hypercall continuations in Xen allow event handlers to be run but the task issuing the hypercall will not be descheduled until the hypercall is complete and the ioctl returns to user space. These long running tasks may also trigger the kernel's soft lockup detection. Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to bracket hypercalls that may be preempted. Use these in the privcmd driver. When returning from an upcall, call xen_maybe_preempt_hcall() which adds a schedule point if if the current task was within a preemptible hypercall. Since _cond_resched() can move the task to a different CPU, clear and set xen_in_preemptible_hcall around the call. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2015-02-111-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull first round of SCSI updates from James Bottomley: "This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas, megaraid_sas, ses) plus an assortment of minor updates. There's also an update to ufs which adds new phy drivers and finally a new logging infrastructure for SCSI" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits) scsi_logging: return void for dev_printk() functions scsi: print single-character strings with seq_putc scsi: merge consecutive seq_puts calls scsi: replace seq_printf with seq_puts aha152x: replace seq_printf with seq_puts advansys: replace seq_printf with seq_puts scsi: remove SPRINTF macro sg: remove an unused variable hpsa: Use local workqueues instead of system workqueues hpsa: add in P840ar controller model name hpsa: add in gen9 controller model names hpsa: detect and report failures changing controller transport modes hpsa: shorten the wait for the CISS doorbell mode change ack hpsa: refactor duplicated scan completion code into a new routine hpsa: move SG descriptor set-up out of hpsa_scatter_gather() hpsa: do not use function pointers in fast path command submission hpsa: print CDBs instead of kernel virtual addresses for uncommon errors hpsa: do not use a void pointer for scsi_cmd field of struct CommandList hpsa: return failed from device reset/abort handlers hpsa: check for ctlr lockup after command allocation in main io path ...
| * scsi: Conditionally compile in constants.cHannes Reinecke2015-01-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Instead of having constants.c littered with ifdef statements we should be moving dummy functions into the header and condintionally compile in constants.c if selected. And update the Kconfig description to reflect the actual size difference. Suggested-by: Christoph Hellwig <hch@infradead.org> Tested-by: Robert Elliott <elliott@hp.com> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | Merge tag 'pm+acpi-3.20-rc1' of ↵Linus Torvalds2015-02-101-4/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "We have a few new features this time, including a new SFI-based cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new devfreq class for providing its governors with raw utilization data and a new ACPI driver for AMD SoCs. Still, the majority of changes here are reworks of existing code to make it more straightforward or to prepare it for implementing new features on top of it. The primary example is the rework of ACPI resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with support for IOAPIC hotplug implemented on top of it, but there is quite a number of changes of this kind in the cpufreq core, ACPICA, ACPI EC driver, ACPI processor driver and the generic power domains core code too. The most active developer is Viresh Kumar with his cpufreq changes. Specifics: - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan)" * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits) tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission ACPI / video: Add disable_native_backlight quirk for Samsung 510R ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code intel_pstate: provide option to only use intel_pstate with HWP ACPI / EC: Add GPE reference counting debugging messages ACPI / EC: Add query flushing support ACPI / EC: Refine command storm prevention support ACPI / EC: Add command flushing support. ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag ACPI: add AMD ACPI2Platform device support for x86 system ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() ACPI / EC: Update revision due to raw handler mode. ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp. ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode. ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model ...
| * | ACPICA: Resources: Provide common part for struct acpi_resource_address ↵Lv Zheng2015-01-261-4/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | structures. struct acpi_resource_address and struct acpi_resource_extended_address64 share substracts just at different offsets. To unify the parsing functions, OSPMs like Linux need a new ACPI_ADDRESS64_ATTRIBUTE as their substructs, so they can extract the shared data. This patch also synchronizes the structure changes to the Linux kernel. The usages are searched by matching the following keywords: 1. acpi_resource_address 2. acpi_resource_extended_address 3. ACPI_RESOURCE_TYPE_ADDRESS 4. ACPI_RESOURCE_TYPE_EXTENDED_ADDRESS And we found and fixed the usages in the following files: arch/ia64/kernel/acpi-ext.c arch/ia64/pci/pci.c arch/x86/pci/acpi.c arch/x86/pci/mmconfig-shared.c drivers/xen/xen-acpi-memhotplug.c drivers/acpi/acpi_memhotplug.c drivers/acpi/pci_root.c drivers/acpi/resource.c drivers/char/hpet.c drivers/pnp/pnpacpi/rsparser.c drivers/hv/vmbus_drv.c Build tests are passed with defconfig/allnoconfig/allyesconfig and defconfig+CONFIG_ACPI=n. Original-by: Thomas Gleixner <tglx@linutronix.de> Original-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | xen/manage: Fix USB interaction issues when resumingRoss Lagerwall2015-02-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 61a734d305e1 ("xen/manage: Always freeze/thaw processes when suspend/resuming") ensured that userspace processes were always frozen before suspending to reduce interaction issues when resuming devices. However, freeze_processes() does not freeze kernel threads. Freeze kernel threads as well to prevent deadlocks with the khubd thread when resuming devices. This is what native suspend and resume does. Example deadlock: [ 7279.648010] [<ffffffff81446bde>] ? xen_poll_irq_timeout+0x3e/0x50 [ 7279.648010] [<ffffffff81448d60>] xen_poll_irq+0x10/0x20 [ 7279.648010] [<ffffffff81011723>] xen_lock_spinning+0xb3/0x120 [ 7279.648010] [<ffffffff810115d1>] __raw_callee_save_xen_lock_spinning+0x11/0x20 [ 7279.648010] [<ffffffff815620b6>] ? usb_control_msg+0xe6/0x120 [ 7279.648010] [<ffffffff81747e50>] ? _raw_spin_lock_irq+0x50/0x60 [ 7279.648010] [<ffffffff8174522c>] wait_for_completion+0xac/0x160 [ 7279.648010] [<ffffffff8109c520>] ? try_to_wake_up+0x2c0/0x2c0 [ 7279.648010] [<ffffffff814b60f2>] dpm_wait+0x32/0x40 [ 7279.648010] [<ffffffff814b6eb0>] device_resume+0x90/0x210 [ 7279.648010] [<ffffffff814b7d71>] dpm_resume+0x121/0x250 [ 7279.648010] [<ffffffff8144c570>] ? xenbus_dev_request_and_reply+0xc0/0xc0 [ 7279.648010] [<ffffffff814b80d5>] dpm_resume_end+0x15/0x30 [ 7279.648010] [<ffffffff81449fba>] do_suspend+0x10a/0x200 [ 7279.648010] [<ffffffff8144a2f0>] ? xen_pre_suspend+0x20/0x20 [ 7279.648010] [<ffffffff8144a1d0>] shutdown_handler+0x120/0x150 [ 7279.648010] [<ffffffff8144c60f>] xenwatch_thread+0x9f/0x160 [ 7279.648010] [<ffffffff810ac510>] ? finish_wait+0x80/0x80 [ 7279.648010] [<ffffffff8108d189>] kthread+0xc9/0xe0 [ 7279.648010] [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80 [ 7279.648010] [<ffffffff8175087c>] ret_from_fork+0x7c/0xb0 [ 7279.648010] [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80 [ 7441.216287] INFO: task khubd:89 blocked for more than 120 seconds. [ 7441.219457] Tainted: G X 3.13.11-ckt12.kz #1 [ 7441.222176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7441.225827] khubd D ffff88003f433440 0 89 2 0x00000000 [ 7441.229258] ffff88003ceb9b98 0000000000000046 ffff88003ce83000 0000000000013440 [ 7441.232959] ffff88003ceb9fd8 0000000000013440 ffff88003cd13000 ffff88003ce83000 [ 7441.236658] 0000000000000286 ffff88003d3e0000 ffff88003ceb9bd0 00000001001aa01e [ 7441.240415] Call Trace: [ 7441.241614] [<ffffffff817442f9>] schedule+0x29/0x70 [ 7441.243930] [<ffffffff81743406>] schedule_timeout+0x166/0x2c0 [ 7441.246681] [<ffffffff81075b80>] ? call_timer_fn+0x110/0x110 [ 7441.249339] [<ffffffff8174357e>] schedule_timeout_uninterruptible+0x1e/0x20 [ 7441.252644] [<ffffffff81077710>] msleep+0x20/0x30 [ 7441.254812] [<ffffffff81555f00>] hub_port_reset+0xf0/0x580 [ 7441.257400] [<ffffffff81558465>] hub_port_init+0x75/0xb40 [ 7441.259981] [<ffffffff814bb3c9>] ? update_autosuspend+0x39/0x60 [ 7441.262817] [<ffffffff814bb4f0>] ? pm_runtime_set_autosuspend_delay+0x50/0xa0 [ 7441.266212] [<ffffffff8155a64a>] hub_thread+0x71a/0x1750 [ 7441.268728] [<ffffffff810ac510>] ? finish_wait+0x80/0x80 [ 7441.271272] [<ffffffff81559f30>] ? usb_port_resume+0x670/0x670 [ 7441.274067] [<ffffffff8108d189>] kthread+0xc9/0xe0 [ 7441.276305] [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80 [ 7441.279131] [<ffffffff8175087c>] ret_from_fork+0x7c/0xb0 [ 7441.281659] [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80 Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xenbus: Add proper handling of XS_ERROR from Xenbus for transactions.Jennifer Herbert2015-02-051-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | If Xenstore sends back a XS_ERROR for TRANSACTION_END, the driver BUGs because it cannot find the matching transaction in the list. For TRANSACTION_START, it leaks memory. Check the message as returned from xenbus_dev_request_and_reply(), and clean up for TRANSACTION_START or discard the error for TRANSACTION_END. Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen/gntdev: provide find_special_page VMA operationDavid Vrabel2015-01-281-0/+11
| | | | | | | | | | | | | | | | | | For a PV guest, use the find_special_page op to find the right page. To handle VMAs being split, remember the start of the original VMA so the correct index in the pages array can be calculated. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* | xen/gntdev: mark userspace PTEs as special on x86 PV guestsDavid Vrabel2015-01-281-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In an x86 PV guest, get_user_pages_fast() on a userspace address range containing foreign mappings does not work correctly because the M2P lookup of the MFN from a userspace PTE may return the wrong page. Force get_user_pages_fast() to fail on such addresses by marking the PTEs as special. If Xen has XENFEAT_gnttab_map_avail_bits (available since at least 4.0), we can do so efficiently in the grant map hypercall. Otherwise, it needs to be done afterwards. This is both inefficient and racy (the mapping is visible to the task before we fixup the PTEs), but will be fine for well-behaved applications that do not use the mapping until after the mmap() system call returns. Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH) do not need this since get_user_pages() has always worked correctly for them. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* | xen/gntdev: safely unmap grants in case they are still in useJennifer Herbert2015-01-281-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | Use gnttab_unmap_refs_async() to wait until the mapped pages are no longer in use before unmapping them. This allows userspace programs to safely use Direct I/O and AIO to a network filesystem which may retain refs to pages in queued skbs after the filesystem I/O has completed. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen/gntdev: convert priv->lock to a mutexDavid Vrabel2015-01-281-20/+20
| | | | | | | | | | | | | | | | Unmapping may require sleeping and we unmap while holding priv->lock, so convert it to a mutex. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* | xen/grant-table: add a mechanism to safely unmap pages that are in useJennifer Herbert2015-01-281-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce gnttab_unmap_refs_async() that can be used to safely unmap pages that may be in use (ref count > 1). If the pages are in use the unmap is deferred and retried later. This polling is not very clever but it should be good enough if the cases where the delay is necessary are rare. The initial delay is 5 ms and is increased linearly on each subsequent retry (to reduce load if the page is in use for a long time). This is needed to allow block backends using grant mapping to safely use network storage (block or filesystem based such as iSCSI or NFS). The network storage driver may complete a block request whilst there is a queued network packet retry (because the ack from the remote end races with deciding to queue the retry). The pages for the retried packet would be grant unmapped and the network driver (or hardware) would access the unmapped page. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen: mark grant mapped pages as foreignJennifer Herbert2015-01-281-2/+41
| | | | | | | | | | | | | | | | | | | | Use the "foreign" page flag to mark pages that have a grant map. Use page->private to store information of the grant (the granting domain and the grant reference). Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen/grant-table: add helpers for allocating pagesDavid Vrabel2015-01-283-5/+34
| | | | | | | | | | | | | | | | Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages suitable to for granted maps. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* | xen: remove scratch frames for ballooned pages and m2p overrideDavid Vrabel2015-01-281-84/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scratch frame mappings for ballooned pages and the m2p override are broken. Remove them in preparation for replacing them with simpler mechanisms that works. The scratch pages did not ensure that the page was not in use. In particular, the foreign page could still be in use by hardware. If the guest reused the frame the hardware could read or write that frame. The m2p override did not handle the same frame being granted by two different grant references. Trying an M2P override lookup in this case is impossible. With the m2p override removed, the grant map/unmap for the kernel mappings (for x86 PV) can be easily batched in set_foreign_p2m_mapping() and clear_foreign_p2m_mapping(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* | xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()David Vrabel2015-01-282-8/+16
| | | | | | | | | | | | | | | | | | | | | | When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* | xen/tmem: mark xen_tmem_init() __initJan Beulich2015-01-231-1/+1
|/ | | | | | | | As a module_init() function, this should have been this way from the beginning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
OpenPOWER on IntegriCloud