summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | blk-mq-debugfs: fix device sched directory for default schedulerOmar Sandoval2017-10-031-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched directories if a default scheduler was already configured by blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do the same for the device-wide sched directory. Fix it. Fixes: d332ce091813 ("blk-mq-debugfs: allow schedulers to register debugfs attributes") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | null_blk: change configfs dependency to selectJens Axboe2017-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent commit made null_blk depend on configfs, which is kind of annoying since you now have to find this dependency and enable that as well. Discovered this since I no longer had null_blk available on a box I needed to debug, since it got killed when the config updated after the configfs change was merged. Fixes: 3bf2bd20734e ("nullb: add configfs interface") Reviewed-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | blk-throttle: fix possible io stall when upgrade to maxJoseph Qi2017-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a case which will lead to io stall. The case is described as follows. /test1 |-subtest1 /test2 |-subtest2 And subtest1 and subtest2 each has 32 queued bios already. Now upgrade to max. In throtl_upgrade_state, it will try to dispatch bios as follows: 1) tg=subtest1, do nothing; 2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending left, no need to schedule next dispatch; 3) tg=subtest2, do nothing; 4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending left, no need to schedule next dispatch; 5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2 to /; note that test1 and test2 each still has 16 queued bios left; 6) tg=/, try to schedule next dispatch, but since disptime is now (update in tg_update_disptime, wait=0), pending timer is not scheduled in fact; 7) In throtl_upgrade_state it totally dispatches 32 queued bios and with 32 left. test1 and test2 each has 16 queued bios; 8) throtl_pending_timer_fn sees the left over bios, but could do nothing, because throtl_select_dispatch returns 0, and test1/test2 has no pending tg. The blktrace shows the following: 8,32 0 0 2.539007641 0 m N throtl upgrade to max 8,32 0 0 2.539072267 0 m N throtl /test2 dispatch nr_queued=16 read=0 write=16 8,32 7 0 2.539077142 0 m N throtl /test1 dispatch nr_queued=16 read=0 write=16 So force schedule dispatch if there are pending children. Reviewed-by: Shaohua Li <shli@fb.com> Signed-off-by: Joseph Qi <qijiang.qj@alibaba-inc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | MAINTAINERS: update list for NBDWouter Verhelst2017-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nbd-general@sourceforge.net becomes nbd@other.debian.org, because sourceforge is just a spamtrap these days. Signed-off-by: Wouter Verhelst <w@uter.be> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | nbd: fix -ERESTARTSYS handlingJosef Bacik2017-10-021-1/+5
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph made it so that if we return'ed BLK_STS_RESOURCE whenever we got ERESTARTSYS from sending our packets we'd return BLK_STS_OK, which means we'd never requeue and just hang. We really need to return the right value from the upper layer. Fixes: fc17b6534eb8 ("blk-mq: switch ->queue_rq return value to blk_status_t") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | bcache: use llist_for_each_entry_safe() in __closure_wake_up()Coly Li2017-09-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 09b3efec ("bcache: Don't reinvent the wheel but use existing llist API") replaces the following while loop by llist_for_each_entry(), - - while (reverse) { - cl = container_of(reverse, struct closure, list); - reverse = llist_next(reverse); - + llist_for_each_entry(cl, reverse, list) { closure_set_waiting(cl, 0); closure_sub(cl, CLOSURE_WAITING + 1); } This modification introduces a potential race by iterating a corrupted list. Here is how it happens. In the above modification, closure_sub() may wake up a process which is waiting on reverse list. If this process decides to wait again by calling closure_wait(), its cl->list will be added to another wait list. Then when llist_for_each_entry() continues to iterate next node, it will travel on another new wait list which is added in closure_wait(), not the original reverse list in __closure_wake_up(). It is more probably to happen on UP machine because the waked up process may preempt the process which wakes up it. Use llist_for_each_entry_safe() will fix the issue, the safe version fetch next node before waking up a process. Then the copy of next node will make sure list iteration stays on original reverse list. Fixes: 09b3efec81de ("bcache: Don't reinvent the wheel but use existing llist API") Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Michael Lyle <mlyle@lyle.org> Reviewed-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | Merge tag 'pci-v4.14-fixes-4' of ↵Linus Torvalds2017-10-063-27/+50
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Fix legacy IDE probe issues exposed by recent PCI core IRQ mapping changes (Bartlomiej Zolnierkiewicz, Lorenzo Pieralisi)" * tag 'pci-v4.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: ide: fix IRQ assignment for PCI bus order probing ide: pci: free PCI BARs on initialization failure ide: free hwif->portdev on hwif_init() failure
| * | | | | | ide: fix IRQ assignment for PCI bus order probingLorenzo Pieralisi2017-10-031-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to assign IRQs for all devices at boot-time, before any drivers claimed devices. The following commits: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") changed this so we now call pci_assign_irq() from pci_device_probe() when we call a driver's probe method. The ide_scan_pcibus() path (enabled by CONFIG_IDEPCI_PCIBUS_ORDER) bypasses pci_device_probe() so it can guarantee devices are claimed in order of PCI bus address. It calls the driver's probe method directly, so it misses the pci_assign_irq() call (and other PCI initialization functions), which causes failures like this: ide0: disabled, no IRQ ide0: failed to initialize IDE interface ide0: disabling port cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07) CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057] cmd64x 0000:00:02.0: can't reserve resources CMD64x_IDE: probe of 0000:00:02.0 failed with error -16 ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x94/0xd0 sysfs: cannot create duplicate filename '/class/ide_port/ide0' ... Trace: [<fffffc000048c9f4>] sysfs_warn_dup+0x94/0xd0 [<fffffc0000330928>] warn_slowpath_fmt+0x58/0x70 [<fffffc000048c9f4>] sysfs_warn_dup+0x94/0xd0 [<fffffc0000486d40>] kernfs_path_from_node+0x30/0x60 [<fffffc00004874ac>] kernfs_put+0x16c/0x2c0 [<fffffc00004874ac>] kernfs_put+0x16c/0x2c0 [<fffffc000048d010>] sysfs_do_create_link_sd.isra.2+0x100/0x120 [<fffffc00005b9d64>] device_add+0x2a4/0x7c0 [<fffffc00005ba5cc>] device_create_groups_vargs+0x14c/0x170 [<fffffc00005ba518>] device_create_groups_vargs+0x98/0x170 [<fffffc00005ba690>] device_create+0x50/0x70 [<fffffc00005df36c>] ide_host_register+0x48c/0xa00 [<fffffc00005df330>] ide_host_register+0x450/0xa00 [<fffffc00005ba2a0>] device_register+0x20/0x50 [<fffffc00005df330>] ide_host_register+0x450/0xa00 [<fffffc00005df944>] ide_host_add+0x64/0xe0 [<fffffc000079b41c>] kobject_uevent_env+0x16c/0x710 [<fffffc0000310288>] do_one_initcall+0x68/0x260 [<fffffc00007b13bc>] kernel_init+0x1c/0x1a0 ... ---[ end trace 24a70433c3e4d374 ]--- ide0: disabling port Fix the IRQ allocation issue by calling pci_assign_irq() from ide_scan_pcidev() before probing the IDE PCI drivers, so that IRQs for a given PCI device are allocated for the IDE PCI drivers to use them for device configuration. Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Link: http://lkml.kernel.org/r/32ec730f-c1b0-5584-cd35-f8a809122b96@roeck-us.net Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com>
| * | | | | | ide: pci: free PCI BARs on initialization failureBartlomiej Zolnierkiewicz2017-10-031-23/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent pci_assign_irq() changes uncovered a problem with missing freeing of PCI BARs on PCI IDE host initialization failure: ide0: disabled, no IRQ ide0: failed to initialize IDE interface ide0: disabling port cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07) CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057] cmd64x 0000:00:02.0: can't reserve resources CMD64x_IDE: probe of 0000:00:02.0 failed with error -16 Fix the problem by adding missing freeing of PCI BARs to ide_setup_pci_controller() and ide_pci_init_two(). Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Link: http://lkml.kernel.org/r/32ec730f-c1b0-5584-cd35-f8a809122b96@roeck-us.net Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> [bhelgaas: add Fixes:] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com>
| * | | | | | ide: free hwif->portdev on hwif_init() failureBartlomiej Zolnierkiewicz2017-10-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent pci_assign_irq() changes uncovered a problem with missing freeing of ide_port class instance on hwif_init() failure in ide_host_register(): ide0: disabled, no IRQ ide0: failed to initialize IDE interface ide0: disabling port cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07) CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057] cmd64x 0000:00:02.0: can't reserve resources CMD64x_IDE: probe of 0000:00:02.0 failed with error -16 ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x94/0xd0 sysfs: cannot create duplicate filename '/class/ide_port/ide0' ... Trace: [<fffffc00003308a0>] __warn+0x160/0x190 [<fffffc000048c9f4>] sysfs_warn_dup+0x94/0xd0 [<fffffc0000330928>] warn_slowpath_fmt+0x58/0x70 [<fffffc000048c9f4>] sysfs_warn_dup+0x94/0xd0 [<fffffc0000486d40>] kernfs_path_from_node+0x30/0x60 [<fffffc00004874ac>] kernfs_put+0x16c/0x2c0 [<fffffc00004874ac>] kernfs_put+0x16c/0x2c0 [<fffffc000048d010>] sysfs_do_create_link_sd.isra.2+0x100/0x120 [<fffffc00005b9d64>] device_add+0x2a4/0x7c0 [<fffffc00005ba5cc>] device_create_groups_vargs+0x14c/0x170 [<fffffc00005ba518>] device_create_groups_vargs+0x98/0x170 [<fffffc00005ba690>] device_create+0x50/0x70 [<fffffc00005df36c>] ide_host_register+0x48c/0xa00 [<fffffc00005df330>] ide_host_register+0x450/0xa00 [<fffffc00005ba2a0>] device_register+0x20/0x50 [<fffffc00005df330>] ide_host_register+0x450/0xa00 [<fffffc00005df944>] ide_host_add+0x64/0xe0 [<fffffc000079b41c>] kobject_uevent_env+0x16c/0x710 [<fffffc0000310288>] do_one_initcall+0x68/0x260 [<fffffc00007b13bc>] kernel_init+0x1c/0x1a0 [<fffffc00007b13a0>] kernel_init+0x0/0x1a0 [<fffffc0000311868>] ret_from_kernel_thread+0x18/0x20 [<fffffc00007b13a0>] kernel_init+0x0/0x1a0 ---[ end trace 24a70433c3e4d374 ]--- ide0: disabling port Fix the problem by adding missing code to ide_host_register(). Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Link: http://lkml.kernel.org/r/32ec730f-c1b0-5584-cd35-f8a809122b96@roeck-us.net Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> [bhelgaas: add Fixes:] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com>
* | | | | | | Merge tag 'arm64-fixes' of ↵Linus Torvalds2017-10-066-7/+45
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Bring initialisation of user space undefined instruction handling early (core_initcall) since late_initcall() happens after modprobe in initramfs is invoked. Similar fix for fpsimd initialisation - Increase the kernel stack when KASAN is enabled - Bring the PCI ACS enabling earlier via the iort_init_platform_devices() - Fix misleading data abort address printing (decimal vs hex) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Ensure fpsimd support is ready before userspace is active arm64: Ensure the instruction emulation is ready for userspace arm64: Use larger stacks when KASAN is selected ACPI/IORT: Fix PCI ACS enablement arm64: fix misleading data abort decoding
| * | | | | | | arm64: Ensure fpsimd support is ready before userspace is activeSuzuki K Poulose2017-10-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We register the pm/hotplug callbacks for FPSIMD as late_initcall, which happens after the userspace is active (from initramfs via populate_rootfs, a rootfs_initcall). Make sure we are ready even before the userspace could potentially use it, by promoting to a core_initcall. Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Dave Martin <dave.martin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: Ensure the instruction emulation is ready for userspaceSuzuki K Poulose2017-10-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We trap and emulate some instructions (e.g, mrs, deprecated instructions) for the userspace. However the handlers for these are registered as late_initcalls and the userspace could be up and running from the initramfs by that time (with populate_rootfs, which is a rootfs_initcall()). This could cause problems for the early applications ending up in failure like : [ 11.152061] modprobe[93]: undefined instruction: pc=0000ffff8ca48ff4 This patch promotes the specific calls to core_initcalls, which are guaranteed to be completed before we hit userspace. Cc: stable@vger.kernel.org Cc: Dave Martin <dave.martin@arm.com> Cc: Matthias Brugger <mbrugger@suse.com> Cc: James Morse <james.morse@arm.com> Reported-by: Matwey V. Kornilov <matwey.kornilov@gmail.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: Use larger stacks when KASAN is selectedMark Rutland2017-10-041-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AddressSanitizer instrumentation can significantly bloat the stack, and with GCC 7 this can result in stack overflows at boot time in some configurations. We can avoid this by doubling our stack size when KASAN is in use, as is already done on x86 (and has been since KASAN was introduced). Regardless of other patches to decrease KASAN's stack utilization, kernels built with KASAN will always require more stack space than those built without, and we should take this into account. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | ACPI/IORT: Fix PCI ACS enablementLorenzo Pieralisi2017-10-041-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f6810c15cf97 ("iommu/arm-smmu: Clean up early-probing workarounds") removed kernel code that was allowing to initialize and probe the SMMU devices early (ie earlier than PCI devices, through linker script callback entries) in the boot process because it was not needed any longer in that the SMMU devices/drivers now support deferred probing. Since the SMMUs probe routines are also in charge of requesting global PCI ACS kernel enablement, commit f6810c15cf97 ("iommu/arm-smmu: Clean up early-probing workarounds") also postponed PCI ACS enablement to SMMUs devices probe time, which is too late given that PCI devices needs to detect if PCI ACS is enabled to init the respective capability through the following call path: pci_device_add() -> pci_init_capabilities() -> pci_enable_acs() Add code in the ACPI IORT SMMU platform devices initialization path (that is called before ACPI PCI enumeration) to detect if there exists firmware mappings to map root complexes ids to SMMU ids and if so enable ACS for the system. Fixes: f6810c15cf97 ("iommu/arm-smmu: Clean up early-probing workarounds") Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Zhou Wang <wangzhou1@hisilicon.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: fix misleading data abort decodingMark Rutland2017-10-021-1/+1
| | |_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently data_abort_decode() dumps the ISS field as a decimal value with a '0x' prefix, which is somewhat misleading. Fix it to print as hexadecimal, as was intended. Fixes: 1f9b8936f36f4a8e ("arm64: Decode information from ESR upon mem faults") Reviewed-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-10-067-13/+20
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Radim Krčmář: - fix PPC XIVE interrupt delivery - fix x86 RCU breakage from asynchronous page faults when built without PREEMPT_COUNT - fix x86 build with -frecord-gcc-switches - fix x86 build without X86_LOCAL_APIC * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: add X86_LOCAL_APIC dependency x86/kvm: Move kvm_fastop_exception to .fixup section kvm/x86: Avoid async PF preempting the kernel incorrectly KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()
| * | | | | | | KVM: add X86_LOCAL_APIC dependencyArnd Bergmann2017-10-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rework of the posted interrupt handling broke building without support for the local APIC: ERROR: "boot_cpu_physical_apicid" [arch/x86/kvm/kvm-intel.ko] undefined! That configuration is probably not particularly useful anyway, so we can avoid the randconfig failures by adding a Kconfig dependency. Fixes: 8b306e2f3c41 ("KVM: VMX: avoid double list add with VT-d posted interrupts") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | | | | | | x86/kvm: Move kvm_fastop_exception to .fixup sectionJosh Poimboeuf2017-10-051-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling the kernel with the '-frecord-gcc-switches' flag, objtool complains: arch/x86/kvm/emulate.o: warning: objtool: .GCC.command.line+0x0: special: can't find new instruction And also the kernel fails to link. The problem is that the 'kvm_fastop_exception' code gets placed into the throwaway '.GCC.command.line' section instead of '.text'. Exception fixup code is conventionally placed in the '.fixup' section, so put it there where it belongs. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | | | | | | kvm/x86: Avoid async PF preempting the kernel incorrectlyBoqun Feng2017-10-043-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call schedule() to reschedule in some cases. This could result in accidentally ending the current RCU read-side critical section early, causing random memory corruption in the guest, or otherwise preempting the currently running task inside between preempt_disable and preempt_enable. The difficulty to handle this well is because we don't know whether an async PF delivered in a preemptible section or RCU read-side critical section for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock() are both no-ops in that case. To cure this, we treat any async PF interrupting a kernel context as one that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing the schedule() path in that case. To do so, a second parameter for kvm_async_pf_task_wait() is introduced, so that we know whether it's called from a context interrupting the kernel, and the parameter is set properly in all the callsites. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | | | | | | KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()Sam Bobroff2017-10-032-4/+2
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the value of state->guest_server as "server". However, this value is not set by it's counterpart kvmppc_xive_set_xive(). When the guest uses this interface to migrate interrupts away from a CPU that is going offline, it sees all interrupts as belonging to CPU 0, so they are left assigned to (now) offline CPUs. This patch removes the guest_server field from the state, and returns act_server in it's place (that is, the CPU actually handling the interrupt, which may differ from the one requested). Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | | | | | | Merge tag 'for-linus' of ↵Linus Torvalds2017-10-069-14/+34
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "This is a pretty small pull request. Only 6 patches in total. There are no outstanding -rc patches on the mailing list after this pull request, so only if some new issues are discovered in the remainder of the rc cycles will you hear from me again. Summary: - a fix for iwpm netlink usage - a fix for error unwinding in mlx5 - two fixes to vlan handling in qedr - a couple small i40iw fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: i40iw: Fix port number for query QP i40iw: Add missing memory barriers RDMA/qedr: Parse vlan priority as sl RDMA/qedr: Parse VLAN ID correctly and ignore the value of zero IB/mlx5: Fix label order in error path handling RDMA/iwpm: Properly mark end of NL messages
| * | | | | | | i40iw: Fix port number for query QPMustafa Ismail2017-10-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port number 0 is an invalid port number. Set it to 1 as there is one port per i40iw device. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | i40iw: Add missing memory barriersMustafa Ismail2017-10-043-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicate set_64bit_val call to offset 24. Replace some instances of set_64bit_val with i40iw_insert_wqe_hdr as valid bit needs a write barrier and should be the last write operation for the WQE. Fixes: 786c6adb3a94 ("i40iw: add puda code") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | RDMA/qedr: Parse vlan priority as slAmrani, Ram2017-10-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Parse the vlan priority from the vlan tag and configure it to the WC's sl field. Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | RDMA/qedr: Parse VLAN ID correctly and ignore the value of zeroAmrani, Ram2017-10-042-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename vlan_id field name to vlan as it contains more than the vlan_id. Mask out non vlan id fields from vlan tag of the QED LL2 RX GSI vlan output. As it is expected to be vlan id only. Ignore vlan_id with value of zero. Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | IB/mlx5: Fix label order in error path handlingParav Pandit2017-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When UAR get_page fails, it needs to continue to cleanup debugfs for congestion control parameters. Labels for error path were incorrectly ordered. This patch fixes to do correct cleanup on debugfs init failure and uar get page failure. Fixes: 4a2da0b8c078 ("IB/mlx5: Add debug control parameters for congestion control") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | RDMA/iwpm: Properly mark end of NL messagesShiraz Saleem2017-09-292-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1a1c116f3dcf removes nlmsg_len calculation in ibnl_put_attr causing netlink messages to be rejected due to incorrect length. Add nlmsg_end after all attributes are appended to calculate the nlmsg_len. Fixes: 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | | | | | | | Merge branch 'for-4.14-rc4' of ↵Linus Torvalds2017-10-062-2/+2
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two more fixes for bugs introduced in 4.13. The sector_t problem with 32bit architecture and !LBDAF config seems serious but the number of affected deployments is hopefully low. The clashing status bits could lead to a confusing in-memory state of the whole-filesystem operations if used with the quota override sysfs knob" * 'for-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix overlap of fs_info::flags values btrfs: avoid overflow when sector_t is 32 bit
| * | | | | | | | Btrfs: fix overlap of fs_info::flags valuesTsutomu Itoh2017-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the values of BTRFS_FS_EXCL_OP and BTRFS_FS_QUOTA_OVERRIDE overlap, we should change the value. First, BTRFS_FS_EXCL_OP was set to 14. commit 171938e52807 ("btrfs: track exclusive filesystem operation in flags") Next, the value of BTRFS_FS_QUOTA_OVERRIDE was set to 14. commit f29efe292198 ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE") As a result, the value 14 overlapped, by accident. This problem is solved by defining the value of BTRFS_FS_EXCL_OP as 16, the flags are internal. Fixes: f29efe292198 ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minimize the change, update only BTRFS_FS_EXCL_OP ] Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | | btrfs: avoid overflow when sector_t is 32 bitGoffredo Baroncelli2017-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jean-Denis Girard noticed commit c821e7f3 "pass bytes to btrfs_bio_alloc" (https://patchwork.kernel.org/patch/9763081/) introduces a regression on 32 bit machines. When CONFIG_LBDAF is _not_ defined (CONFIG_LBDAF == Support for large (2TB+) block devices and files) sector_t is 32 bit on 32bit machines. In the function submit_extent_page, 'sector' (which is sector_t type) is multiplied by 512 to convert it from sectors to bytes, leading to an overflow when the disk is bigger than 4GB (!). I added a cast to u64 to avoid overflow. Fixes: c821e7f3 ("btrfs: pass bytes to btrfs_bio_alloc") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | | | | | Merge tag 'ceph-for-4.14-rc4' of git://github.com/ceph/ceph-clientLinus Torvalds2017-10-062-10/+9
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fixes from Ilya Dryomov: "Two fixups for CephFS snapshot-handling patches in -rc1" * tag 'ceph-for-4.14-rc4' of git://github.com/ceph/ceph-client: ceph: fix __choose_mds() for LSSNAP request ceph: properly queue cap snap for newly created snap realm
| * | | | | | | | | ceph: fix __choose_mds() for LSSNAP requestYan, Zheng2017-10-021-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previous commit 5d37ca14 "ceph: send LSSNAP request to auth mds of directory inode" is buggy. It makes __choose_mds() choose mds base on hash of '.snap' dentry. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: properly queue cap snap for newly created snap realmYan, Zheng2017-10-021-5/+3
| | |_|/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3ae0bebc "ceph: queue cap snap only when snap realm's context changes" introduced a regression: we may not call queue_realm_cap_snaps() for newly created snap realm. This regression allows unflushed snapshot data to be overwritten. Link: http://tracker.ceph.com/issues/21483 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | | | | | | | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2017-10-0610-37/+60
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix a regression in 4.14 and one in 4.13. The latter is a case when Docker is doing something it really shouldn't and gets away with it. We now print a warning instead of erroring out. There are also fixes to several error paths" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix regression caused by exclusive upper/work dir protection ovl: fix missing unlock_rename() in ovl_do_copy_up() ovl: fix dentry leak in ovl_indexdir_cleanup() ovl: fix dput() of ERR_PTR in ovl_cleanup_index() ovl: fix error value printed in ovl_lookup_index() ovl: fix may_write_real() for overlayfs directories
| * | | | | | | | | ovl: fix regression caused by exclusive upper/work dir protectionAmir Goldstein2017-10-053-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enforcing exclusive ownership on upper/work dirs caused a docker regression: https://github.com/moby/moby/issues/34672. Euan spotted the regression and pointed to the offending commit. Vivek has brought the regression to my attention and provided this reproducer: Terminal 1: mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ Terminal 2: unshare -m Terminal 1: umount merged mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ mount: /root/overlay-testing/merged: none already mounted or mount point busy To fix the regression, I replaced the error with an alarming warning. With index feature enabled, mount does fail, but logs a suggestion to override exclusive dir protection by disabling index. Note that index=off mount does take the inuse locks, so a concurrent index=off will issue the warning and a concurrent index=on mount will fail. Documentation was updated to reflect this change. Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Euan Kemp <euank@euank.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | | | | | | ovl: fix missing unlock_rename() in ovl_do_copy_up()Amir Goldstein2017-10-054-24/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the ovl_lock_rename_workdir() helper which requires unlock_rename() only on lock success. Fixes: ("fd210b7d67ee ovl: move copy up lock out") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | | | | | | ovl: fix dentry leak in ovl_indexdir_cleanup()Amir Goldstein2017-10-051-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | index dentry was not released when breaking out of the loop due to index verification error. Fixes: 415543d5c64f ("ovl: cleanup bad and stale index entries on mount") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | | | | | | ovl: fix dput() of ERR_PTR in ovl_cleanup_index()Amir Goldstein2017-10-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | | | | | | ovl: fix error value printed in ovl_lookup_index()Amir Goldstein2017-10-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | | | | | | ovl: fix may_write_real() for overlayfs directoriesAmir Goldstein2017-10-051-1/+3
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlayfs directory file_inode() is the overlay inode whether the real inode is upper or lower. This fixes a regression in xfstest generic/158. Fixes: 7c6893e3c9ab ("ovl: don't allow writing ioctl on lower layer") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | | | | | | | | Merge tag 'powerpc-4.14-4' of ↵Linus Torvalds2017-10-068-9/+48
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Nine small fixes, really nothing that stands out. A work-around for a spurious MCE on Power9. A CXL fault handling fix, some fixes to the new XIVE code, and a fix to the new 32-bit STRICT_KERNEL_RWX code. Fixes for old code/stable: an fix to an incorrect TLB flush on boot but not on any current machines, a compile error on 4xx and a fix to memory hotplug when using radix (Power9). Thanks to: Anton Blanchard, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Guenter Roeck, Jeremy Kerr, Michael Neuling, Nicholas Piggin" * tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Increase memory block size to 1GB on radix powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled powerpc/xive: Clear XIVE internal structures when a CPU is removed powerpc/xive: Fix IPI reset powerpc/4xx: Fix compile error with 64K pages on 40x, 44x powerpc: Fix action argument for cpufeatures-based TLB flush cxl: Fix memory page not handled powerpc: Fix workaround for spurious MCE on POWER9 powerpc: Handle MCE on POWER9 with only DSISR bit 30 set
| * | | | | | | | | powerpc/powernv: Increase memory block size to 1GB on radixAnton Blanchard2017-10-061-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory hot unplug on PowerNV radix hosts is broken. Our memory block size is 256MB but since we map the linear region with very large pages, each pte we tear down maps 1GB. A hot unplug of one 256MB memory block results in 768MB of memory getting unintentionally unmapped. At this point we are likely to oops. Fix this by increasing our memory block size to 1GB on PowerNV radix hosts. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | powerpc/mm: Call flush_tlb_kernel_range with interrupts enabledGuenter Roeck2017-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush_tlb_kernel_range() may call smp_call_function_many() which expects interrupts to be enabled. This results in a traceback. WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-00009-g0666f56 #1 task: cf830000 task.stack: cf82e000 NIP: c00a93c8 LR: c00a9634 CTR: 00000001 REGS: cf82fde0 TRAP: 0700 Not tainted (4.14.0-rc1-00009-g0666f56) MSR: 00021000 <CE,ME> CR: 24000082 XER: 00000000 GPR00: c00a9634 cf82fe90 cf830000 c050ad3c c0015a54 00000000 00000001 00000001 GPR08: 00000001 00000000 00000000 cf82e000 24000084 00000000 c0003150 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000001 00000000 c0510000 GPR24: 00000000 c0015a54 00000000 c050ad3c c051823c c050ad3c 00000025 00000000 NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc LR [c00a9634] smp_call_function+0x3c/0x50 Call Trace: [cf82fe90] [00000010] 0x10 (unreliable) [cf82fed0] [c00a9634] smp_call_function+0x3c/0x50 [cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38 [cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c [cf82ff20] [c001484c] free_initmem+0x20/0x4c [cf82ff30] [c000316c] kernel_init+0x1c/0x108 [cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f890000 40beffac 3d20c051 8929ac64 2f890000 40beff9c <0fe00000> 4bffff94 7fc3f378 7f64db78 Fixes: 3184cc4b6f6a ("powerpc/mm: Fix kernel RAM protection after freeing ...") Fixes: e611939fc8ec ("powerpc/mm: Ensure change_page_attr() doesn't ...") Cc: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | powerpc/xive: Clear XIVE internal structures when a CPU is removedCédric Le Goater2017-10-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") introduced support for the XIVE exploitation mode of the P9 interrupt controller on the pseries platform. At that time, support for CPU removal was not complete on PowerVM and CPU hot unplug remained untested. It appears that some cleanups of the XIVE internal structures are required before releasing the CPU, without which the kernel crashes in a RTAS call doing the CPU isolation. These changes fix the crash by deconfiguring the IPI interrupt source and clearing the event queues of the CPU when it is removed. Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | powerpc/xive: Fix IPI resetCédric Le Goater2017-10-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When resetting an IPI, hw_ipi should also be set to zero. Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | powerpc/4xx: Fix compile error with 64K pages on 40x, 44xChristian Lamparter2017-10-031-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mmu context on the 40x, 44x does not define pte_frag entry. This causes gcc abort the compilation due to: setup-common.c: In function ‘setup_arch’: setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’ This patch fixes the issue by removing the pte_frag initialization in setup-common.c. This is possible, because the compiler will do the initialization, since the mm_context is a sub struct of init_mm. init_mm is declared in mm_types.h as external linkage. According to C99 6.2.4.3: An object whose identifier is declared with external linkage [...] has static storage duration. C99 defines in 6.7.8.10 that: If an object that has static storage duration is not initialized explicitly, then: - if it has pointer type, it is initialized to a null pointer Fixes: b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | powerpc: Fix action argument for cpufeatures-based TLB flushJeremy Kerr2017-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the cpufeatures code, specifying the number of sets to flush. However, these functions take an action argument, not a number of sets. This means we hit the BUG() in __flush_tlb_{206,300} when using cpufeatures-style configuration. This change passes TLB_INVAL_SCOPE_GLOBAL instead. Fixes: 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | cxl: Fix memory page not handledChristophe Lombard2017-09-291-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The in-kernel 'library' API can be called by drivers to help interaction with an IBM XSL on a POWER9 system. The cxllib_handle_fault() API is used to handle memory fault. All memory pages of the specified buffer have to be handled but under certain conditions,the last page may not be touched, and the address the adapter is trying to access is never sent to the kernel for resolution. This patch reworks start address of the loop with an address aligned on the page size. In this context, the last page is not missed. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL"); Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | | | | powerpc: Fix workaround for spurious MCE on POWER9Michael Neuling2017-09-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the recent commit d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") I screwed up the bit number. It should be bit 25 (IBM bit 38). Fixes: d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud