summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2014-09-072-12/+12
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "A boot hang fix for the offloaded callback RCU model (RCU_NOCB_CPU=y && (TREE_CPU=y || TREE_PREEMPT_RC)) in certain bootup scenarios" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make nocb leader kthreads process pending callbacks after spawning
| | * \ \ Merge branch 'rcu/urgent' of ↵Ingo Molnar2014-09-032-12/+12
| | |\ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull an RCU fix from Paul E. McKenney: "This series contains a single commit fixing an initialization bug reported by Amit Shah and fixed by Pranith Kumar (and tested by Amit). This bug results in a boot-time hang in callback-offloaded configurations where callbacks were posted before the offloading ('rcuo') kthreads were created." Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | | * | rcu: Make nocb leader kthreads process pending callbacks after spawningPranith Kumar2014-08-282-12/+12
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nocb callbacks generated before the nocb kthreads are spawned are enqueued in the nocb queue for later processing. Commit fbce7497ee5af ("rcu: Parallelize and economize NOCB kthread wakeups") introduced nocb leader kthreads which checked the nocb_leader_wake flag to see if there were any such pending callbacks. A case was reported in which newly spawned leader kthreads were not processing the pending callbacks as this flag was not set, which led to a boot hang. The following commit ensures that the newly spawned nocb kthreads process the pending callbacks by allowing the kthreads to run immediately after spawning instead of waiting. This is done by inverting the logic of nocb_leader_wake tests to nocb_leader_sleep which allows us to use the default initialization of this flag to 0 to let the kthreads run. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://www.spinics.net/lists/kernel/msg1802899.html [ paulmck: Backported to v3.17-rc2. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Amit Shah <amit.shah@redhat.com>
| * | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2014-09-074-11/+39
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Three fixlets from the timer departement: - Update the timekeeper before updating vsyscall and pvclock. This fixes the kvm-clock regression reported by Chris and Paolo. - Use the proper irq work interface from NMI. This fixes the regression reported by Catalin and Dave. - Clarify the compat_nanosleep error handling mechanism to avoid future confusion" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Update timekeeper before updating vsyscall and pvclock compat: nanosleep: Clarify error handling nohz: Restore NMI safe local irq work for local nohz kick
| | * | | timekeeping: Update timekeeper before updating vsyscall and pvclockThomas Gleixner2014-09-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The update_walltime() code works on the shadow timekeeper to make the seqcount protected region as short as possible. But that update to the shadow timekeeper does not update all timekeeper fields because it's sufficient to do that once before it becomes life. One of these fields is tkr.base_mono. That stays stale in the shadow timekeeper unless an operation happens which copies the real timekeeper to the shadow. The update function is called after the update calls to vsyscall and pvclock. While not correct, it did not cause any problems because none of the invoked update functions used base_mono. commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based) changed that in the kvm pvclock update function, so the stale mono_base value got used and caused kvm-clock to malfunction. Put the update where it belongs and fix the issue. Reported-by: Chris J Arges <chris.j.arges@canonical.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | compat: nanosleep: Clarify error handlingThomas Gleixner2014-09-061-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handling in compat_sys_nanosleep() is correct, but completely non obvious. Document it and restrict it to the -ERESTART_RESTARTBLOCK return value for clarity. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | nohz: Restore NMI safe local irq work for local nohz kickFrederic Weisbecker2014-09-042-6/+15
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The local nohz kick is currently used by perf which needs it to be NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9) changed its implementation to fire the local kick using the remote kick API. It was convenient to make the code more generic but the remote kick isn't NMI-safe. As a result: WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140() CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34 0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37 0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180 0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000 Call Trace: <NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20 [<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140 [<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90 [<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350 [<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff9a187934>] perf_event_overflow+0x14/0x20 [<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130 [<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff9a007b72>] nmi_handle+0xd2/0x390 [<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff9a0d131b>] ? lock_release+0xab/0x330 [<ffffffff9a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200 [<ffffffff9a008268>] do_nmi+0xb8/0x100 Lets fix this by restoring the use of local irq work for the nohz local kick. Reported-by: Catalin Iacob <iacobcatalin@gmail.com> Reported-and-tested-by: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| * | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-09-068-17/+28
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "A smattering of bug fixes across most architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: powerpc/kvm/cma: Fix panic introduces by signed shift operation KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags KVM: s390/mm: Fix storage key corruption during swapping arm/arm64: KVM: Complete WFI/WFE instructions ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU KVM: s390/mm: try a cow on read only pages for key ops KVM: s390: Fix user triggerable bug in dead code
| | * | | powerpc/kvm/cma: Fix panic introduces by signed shift operationLaurent Dufour2014-09-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fc95ca7284bc54953165cba76c3228bd2cdb9591 introduces a memset in kvmppc_alloc_hpt since the general CMA doesn't clear the memory it allocates. However, the size argument passed to memset is computed from a signed value and its signed bit is extended by the cast the compiler is doing. This lead to extremely large size value when dealing with order value >= 31, and almost all the memory following the allocated space is cleaned. As a consequence, the system is panicing and may even fail spawning the kdump kernel. This fix makes use of an unsigned value for the memset's size argument to avoid sign extension. Among this fix, another shift operation which may lead to signed extended value too is also fixed. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Graf <agraf@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | | Merge tag 'kvm-s390-master-20140902' of ↵Paolo Bonzini2014-09-021-2/+4
| | |\ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
| | | * | | KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flagsChristian Borntraeger2014-09-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") triggered another paging/storage key corruption. There is an unhandled invalid->valid pte change where we have to set the real storage key from the pgste. When doing paging a guest page might be swapcache or swap and when faulted in it might be read-only and due to a parallel scan old. An do_wp_page will make it writeable and young. Due to software reference tracking this page was invalid and now becomes valid. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: stable@vger.kernel.org # v3.12+
| | | * | | KVM: s390/mm: Fix storage key corruption during swappingChristian Borntraeger2014-09-021-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 3.12 or more precisely commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") guest storage keys get corrupted during paging. This commit added another valid->invalid translation for page tables - namely ptep_test_and_clear_young. We have to transfer the storage key into the pgste in that case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: stable@vger.kernel.org # v3.12+
| | * | | | Merge tag 'kvm-arm-for-v3.17-rc3' of ↵Paolo Bonzini2014-08-294-0/+12
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD These fixes fix two issues in KVM for arm/arm64: - hyp mode initialization issues on certian boards/bootloader combos. - incorrect return address from trapped WFI/WFE instrucitons, which breaks non-linux guests.
| | | * | | | arm/arm64: KVM: Complete WFI/WFE instructionsChristoffer Dall2014-08-292-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The architecture specifies that when the processor wakes up from a WFE or WFI instruction, the instruction is considered complete, however we currrently return to EL1 (or EL0) at the WFI/WFE instruction itself. While most guests may not be affected by this because their local exception handler performs an exception returning setting the event bit or with an interrupt pending, some guests like UEFI will get wedged due this little mishap. Simply skip the instruction when we have completed the emulation. Cc: <stable@vger.kernel.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | | * | | | ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMUPranavkumar Sawargaonkar2014-08-292-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X-Gene u-boot runs in EL2 mode with MMU enabled hence we might have stale EL2 tlb enteris when we enable EL2 MMU on each host CPU. This can happen on any ARM/ARM64 board running bootloader in Hyp-mode (or EL2-mode) with MMU enabled. This patch ensures that we flush all Hyp-mode (or EL2-mode) TLBs on each host CPU before enabling Hyp-mode (or EL2-mode) MMU. Cc: <stable@vger.kernel.org> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org> Signed-off-by: Anup Patel <anup.patel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | Merge tag 'kvm-s390-20140825' of ↵Paolo Bonzini2014-08-252-13/+10
| | |\ \ \ \ \ | | | |/ / / / | | |/| / / / | | | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master Here are two fixes for s390 KVM code that prevent: 1. a malicious user to trigger a kernel BUG 2. a malicious user to change the storage key of read-only pages
| | | * | | KVM: s390/mm: try a cow on read only pages for key opsChristian Borntraeger2014-08-251-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PFMF instruction handler blindly wrote the storage key even if the page was mapped R/O in the host. Lets try a COW before continuing and bail out in case of errors. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
| | | * | | KVM: s390: Fix user triggerable bug in dead codeChristian Borntraeger2014-08-251-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the early days, we had some special handling for the KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit d7b0b5eb3000 (KVM: s390: Make psw available on all exits, not just a subset). Now this switch statement is just a sanity check for userspace not messing with the kvm_run structure. Unfortunately, this allows userspace to trigger a kernel BUG. Let's just remove this switch statement. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
| * | | | | Merge tag 'fixes-for-linus' of ↵Linus Torvalds2014-09-0612-59/+90
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Kevin Hilman: "Another round of fixes from arm-soc land, which are mostly DT fixes for: - OMAP: handful of DT fixes devices on newly supported hardware - davinci: fix 2nd EDMA channel - ux500: extend previous pinctrl fix to another board - at91: clock registration fixes, compatibility string precision And one more fix for event cleanup in drivers/bus/arm-ccn" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: bus: arm-ccn: Move event cleanup routine ARM: at91/dt: rm9200: fix usb clock definition ARM: at91: rm9200: fix clock registration ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver ARM: dts: dra7-evm: Add vtt regulator support ARM: dts: dra7-evm: Fix spi1 mux documentation ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NAND ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoring ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoring ARM: dts: am437x-gp-evm: Don't use read/write wait monitoring ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8 ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8 ARM: dts: am4372: fix USB regs size ARM: dts: am437x-gp: switch i2c0 to 100KHz ARM: dts: dra7-evm: Fix 8th NAND partition's name ARM: dts: dra7-evm: Fix i2c3 pinmux and frequency ARM: ux500: disable msp2 node on Snowball ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CC ARM: dts: set 'ti,set-rate-parent' for dpll4_m5x2 clock
| | * \ \ \ \ Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixesKevin Hilman2014-09-053-2/+12
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge "at91: fixes for 3.17 #1" from Nicols Ferre: First AT91 fixes batch for 3.17: - compatibility string precision - clock registration and USB DT fix for at91rm9200 * tag 'at91-fixes' of git://github.com/at91linux/linux-at91: ARM: at91/dt: rm9200: fix usb clock definition ARM: at91: rm9200: fix clock registration ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver Signed-off-by: Kevin Hilman <khilman@linaro.org>
| | | * | | | | ARM: at91/dt: rm9200: fix usb clock definitionAlexandre Belloni2014-09-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The atmel,clk-divisors property is taking 4 divisors, if less are provided, the clock registration will fail. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
| | | * | | | | ARM: at91: rm9200: fix clock registrationAlexandre Belloni2014-09-051-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Actually register clocks from device tree when using the common clock framework. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> [nicolas.ferre@atmel.com: add at91 to function name] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
| | | * | | | | ARM: at91/dt: sam9g20: set at91sam9g20 pllb driverGaël PORTAY2014-09-051-0/+1
| | | | |_|/ / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The at91sam9g20 SOC uses its own pllb implementation which is different from the one inherited from at91sam9260 SOC. Signed-off-by: Gaël PORTAY <gael.portay@gmail.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
| | * | | | | bus: arm-ccn: Move event cleanup routinePawel Moll2014-09-051-26/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function cleaning up an initialized event was called from the "event_del" handler, instead of being used as the "destroy" callback. In case of events group allocation this caused NULL pointer dereference (as events are added and deleted multiple times then). Fixed now. Signed-off-by: Pawel Moll <mail@pawelmoll.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
| | * | | | | Merge tag 'omap-fixes-against-v3.17-rc3' of ↵Kevin Hilman2014-09-05317-1293/+2620
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge "omap fixes against v3.17-rc3" from Tony Lindgren: Few fixes for omaps mostly for various devices to get them working properly on the new am437x and dra7 hardware for several devices such as I2C, NAND, DDR3 and USB. There's also a clock fix for omap3. And also included are two minor cosmetic fixes that are not stictly fixes for the new hardware support added recently to downgrade a GPMC warning into a debug statement, and fix the confusing comments for dra7-evm spi1 mux. Note that these are all .dts changes except for a GPMC change. * tag 'omap-fixes-against-v3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (255 commits) ARM: dts: dra7-evm: Add vtt regulator support ARM: dts: dra7-evm: Fix spi1 mux documentation ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NAND ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoring ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoring ARM: dts: am437x-gp-evm: Don't use read/write wait monitoring ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8 ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8 ARM: dts: am4372: fix USB regs size ARM: dts: am437x-gp: switch i2c0 to 100KHz ARM: dts: dra7-evm: Fix 8th NAND partition's name ARM: dts: dra7-evm: Fix i2c3 pinmux and frequency Linux 3.17-rc3 ... Signed-off-by: Kevin Hilman <khilman@linaro.org>
| | | * | | | | ARM: dts: dra7-evm: Add vtt regulator supportLokesh Vutla2014-09-041-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DRA7 evm REV G and later boards uses a vtt regulator for DDR3 termination and this is controlled by gpio7_11. This gpio is configured in boot loader. gpio7_11, which is only available only on Pad A22, in previous boards, is connected only to an unused pad on expansion connector EXP_P3 and is safe to be muxed as GPIO on all DRA7-evm versions (without a need to spin off another dts file). Since gpio7_11 is used to control VTT and should not be reset or kept in idle state during boot up else VTT will be disconnected and DDR gets corrupted. So, as part of this change, mark gpio7 as no-reset and no-idle on init. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: dra7-evm: Fix spi1 mux documentationNishanth Menon2014-09-041-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While auditing the various pin ctrl configurations using the following command: grep PIN_ arch/arm/boot/dts/dra7-evm.dts|(while read line; do v=`echo "$line" | sed -e "s/\s\s*/|/g" | cut -d '|' -f1 | cut -d 'x' -f2|tr [a-z] [A-Z]`; HEX=`echo "obase=16;ibase=16;4A003400+$v"| bc`; echo "$HEX ===> $line"; done) against DRA75x/74x NDA TRM revision S(SPRUHI2S August 2014), documentation errors were found for spi1 pinctrl. Fix the same. Fixes: 6e58b8f1daaf1af ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board") Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NANDRoger Quadros2014-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both QSPI and GPMC-NAND share the same Pin (A8) from the SoC for Chip Select functionality. So both can't be enabled simultaneously. Disable QSPI node to prevent the pin conflict as well as be similar to 3.12 release. CC: Sourav Poddar <sourav.poddar@ti.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoringRoger Quadros2014-09-041-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For NAND read & write wait pin monitoring must be kept disabled as the wait pin is only used to indicate NAND device ready status and not to extend each read/write cycle. So don't print a warning if wait pin is specified while read/write monitoring is not in the device tree. Sanity check wait pin number irrespective if read/write monitoring is set or not. Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoringRoger Quadros2014-09-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NAND uses wait pin only to indicate device readiness after a block/page operation. It is not use to extend individual read/write cycle and so read/write wait pin monitoring must be disabled for NAND. Add gpmc wait pin information as the NAND uses wait pin 0 for device ready indication. Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am437x-gp-evm: Don't use read/write wait monitoringRoger Quadros2014-09-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NAND uses wait pin only to indicate device readiness after a block/page operation. It is not use to extend individual read/write cycle and so read/write wait pin monitoring must be disabled for NAND. This patch also gets rid of the below warning when NAND is accessed for the first time. omap_l3_noc 44000000.ocp: L3 application error: target 13 mod:1 (unclearable) Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8Roger Quadros2014-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | am437x-gp-evm uses a NAND chip with page size 4096 bytes and spare area of 225 bytes per page. For such a setup it is preferrable to use BCH16 ECC scheme over BCH8. This also makes it compatible with ROM code ECC scheme so we can boot with NAND after flashing from kernel. Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8Roger Quadros2014-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | am43x-epos-evm uses a NAND chip with page size 4096 bytes and spare area of 225 bytes per page. For such a setup it is preferrable to use BCH16 ECC scheme over BCH8. This also makes it compatible with ROM code ECC scheme so we can boot with NAND after flashing from kernel. Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am4372: fix USB regs sizeFelipe Balbi2014-09-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Size should be 64KiB instead of 92KiB. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: am437x-gp: switch i2c0 to 100KHzNishanth Menon2014-09-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the GP EVM, the ambient light sensor is limited to 100KHz on the I2C bus. So use 100kHz for I2C on the GP EVM due to this limitation on the ambient light sensor. Reported-by: Aparna Balasubramanian <aparnab@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | Merge branch 'for-v3.17-rc/ti-clk-dt' of github.com:t-kristo/linux-pm into ↵Tony Lindgren2014-09-031-0/+1
| | | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | fixes-rc3
| | | | * | | | | ARM: dts: set 'ti,set-rate-parent' for dpll4_m5x2 clockStefan Herbrechtsmeier2014-08-211-0/+1
| | | | | |_|/ / | | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set 'ti,set-rate-parent' property for the dpll4_m5x2_ck clock, which is used for the ISP functional clock. This fixes the OMAP3 ISP driver's clock rate configuration on OMAP34xx, which needs the rate to be propagated properly to the divider node (dpll4_m5_ck). Signed-off-by: Stefan Herbrechtsmeier <stefan@herbrechtsmeier.net> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Tero Kristo <t-kristo@ti.com> Cc: <linux-media@vger.kernel.org> Cc: <linux-omap@vger.kernel.org> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Tero Kristo <t-kristo@ti.com>
| | | * | | | | ARM: dts: dra7-evm: Fix 8th NAND partition's nameRoger Quadros2014-09-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 8th NAND partition should be named "NAND.u-boot-env.backup1" instead of "NAND.u-boot-env". This is to be consistent with other TI boards as well as u-boot. CC: Pekon Gupta <pekon@pek-sem.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | ARM: dts: dra7-evm: Fix i2c3 pinmux and frequencyRoger Quadros2014-09-031-3/+3
| | | | |/ / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The I2C3 pins are taken from pads E21 (GPIO6_14) and F20 (GPIO6_15). Use the right pinmux register and mode. Also set the I2C3 bus frequency to a safer 400KHz than 3.4Mhz. CC: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | | Merge tag 'davinci-fixes-for-v3.17-rc4' of ↵Arnd Bergmann2014-09-041-4/+5
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes This patch fixes setup of second EDMA channel controller on DA850. * tag 'davinci-fixes-for-v3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CC
| | | * | | | | ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CCPeter Ujfalusi2014-08-261-4/+5
| | | | |/ / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The edma_setup_from_hw() should know about the CC number when parsing the CCCFG register - when it reads the register to be precise. The base addresses for CCs stored in an array and we need to provide the correct id to edma_read() in order to read the correct register. Cc: <stable@vger.kernel.org> # 3.16 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
| | * | | | | ARM: ux500: disable msp2 node on SnowballLinus Walleij2014-09-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Analogous to commit 8858d88a25142544843869f0cd3e6654aa7b4aec that fixed commit 70b41abc151f9 "ARM: ux500: move MSP pin control to the device tree" accidentally activated MSP2, giving rise to a boot scroll scream as the kernel attempts to probe a driver for it and fails to obtain DMA channel 14. For some reason I forgot to fix this on the Snowball. Fix this up by marking the node disabled again. Cc: Lee Jones <lee.jones@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Kevin Hilman <khilman@linaro.org>
| * | | | | | Merge tag 'xfs-for-linus-3.17-rc3' of ↵Linus Torvalds2014-09-064-12/+114
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "The fixes all address recently discovered data corruption issues. The original Direct IO issue was discovered by Chris Mason @ Facebook on a production workload which mixed buffered reads with direct reads and writes IO to the same file. The fix for that exposed other issues with page invalidation (exposed by millions of fsx operations) failing due to dirty buffers beyond EOF. Finally, the collapse_range code could also cause problems due to racing writeback changing the extent map while it was being shifted around. The commits for that problem are simple mitigation fixes that prevent the problem from occuring. A more robust fix for 3.18 that addresses the underlying problem is currently being worked on by Brian. Summary of fixes: - a direct IO read/buffered read data corruption - the associated fallout from the DIO data corruption fix - collapse range bugs that are potential data corruption issues" * tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: trim eofblocks before collapse range xfs: xfs_file_collapse_range is delalloc challenged xfs: don't log inode unless extent shift makes extent modifications xfs: use ranged writeback and invalidation for direct IO xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't dirty buffers beyond EOF
| | * | | | | | xfs: trim eofblocks before collapse rangeBrian Foster2014-09-021-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_collapse_file_space() currently writes back the entire file undergoing collapse range to settle things down for the extent shift algorithm. While this prevents changes to the extent list during the collapse operation, the writeback itself is not enough to prevent unnecessary collapse failures. The current shift algorithm uses the extent index to iterate the in-core extent list. If a post-eof delalloc extent persists after the writeback (e.g., a prior zero range op where the end of the range aligns with eof can separate the post-eof blocks such that they are not written back and converted), xfs_bmap_shift_extents() becomes confused over the encoded br_startblock value and fails the collapse. As with the full writeback, this is a temporary fix until the algorithm is improved to cope with a volatile extent list and avoid attempts to shift post-eof extents. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | | | xfs: xfs_file_collapse_range is delalloc challengedDave Chinner2014-09-021-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have delalloc extents on a file before we run a collapse range opertaion, we sync the range that we are going to collapse to convert delalloc extents in that region to real extents to simplify the shift operation. However, the shift operation then assumes that the extent list is not going to change as it iterates over the extent list moving things about. Unfortunately, this isn't true because we can't hold the ILOCK over all the operations. We can prevent new IO from modifying the extent list by holding the IOLOCK, but that doesn't prevent writeback from running.... And when writeback runs, it can convert delalloc extents is the range of the file prior to the region being collapsed, and this changes the indexes of all the extents in the file. That causes the collapse range operation to Go Bad. The right fix is to rewrite the extent shift operation not to be dependent on the extent list not changing across the entire operation, but this is a fairly significant piece of work to do. Hence, as a short-term workaround for the problem, sync the entire file before starting a collapse operation to remove all delalloc ranges from the file and so avoid the problem of concurrent writeback changing the extent list. Diagnosed-and-Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | | | xfs: don't log inode unless extent shift makes extent modificationsBrian Foster2014-09-021-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The file collapse mechanism uses xfs_bmap_shift_extents() to collapse all subsequent extents down into the specified, previously punched out, region. This function performs some validation, such as whether a sufficient hole exists in the target region of the collapse, then shifts the remaining exents downward. The exit path of the function currently logs the inode unconditionally. While we must log the inode (and abort) if an error occurs and the transaction is dirty, the initial validation paths can generate errors before the transaction has been dirtied. This creates an unnecessary filesystem shutdown scenario, as the caller will cancel a transaction that has been marked dirty. Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications are made to the inode bmap. Only log the inode in the exit path if logflags has been set. This ensures we only have to cancel a dirty transaction if modifications have been made and prevents an unnecessary filesystem shutdown otherwise. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | | | xfs: use ranged writeback and invalidation for direct IODave Chinner2014-09-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we are not doing silly things with dirtying buffers beyond EOF and using invalidation correctly, we can finally reduce the ranges of writeback and invalidation used by direct IO to match that of the IO being issued. Bring the writeback and invalidation ranges back to match the generic direct IO code - this will greatly reduce the perturbation of cached data when direct IO and buffered IO are mixed, but still provide the same buffered vs direct IO coherency behaviour we currently have. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | | | xfs: don't zero partial page cache pages during O_DIRECT writesDave Chinner2014-09-021-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to direct IO reads, direct IO writes are using truncate_pagecache_range to invalidate the page cache. This is incorrect due to the sub-block zeroing in the page cache that truncate_pagecache_range() triggers. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. cc: stable@vger.kernel.org Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | | | xfs: don't zero partial page cache pages during O_DIRECT writesChris Mason2014-09-021-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs is using truncate_pagecache_range to invalidate the page cache during DIO reads. This is different from the other filesystems who only invalidate pages during DIO writes. truncate_pagecache_range is meant to be used when we are freeing the underlying data structs from disk, so it will zero any partial ranges in the page. This means a DIO read can zero out part of the page cache page, and it is possible the page will stay in cache. buffered reads will find an up to date page with zeros instead of the data actually on disk. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. [dchinner: catch error and warn if it fails. Comment.] cc: stable@vger.kernel.org Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | | | xfs: don't dirty buffers beyond EOFDave Chinner2014-09-021-0/+61
| | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic/263 is failing fsx at this point with a page spanning EOF that cannot be invalidated. The operations are: 1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes) 1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes) 1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes) where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO write attempts to invalidate the cached page over this range, it fails with -EBUSY and so any attempt to do page invalidation fails. The real question is this: Why can't that page be invalidated after it has been written to disk and cleaned? Well, there's data on the first two buffers in the page (1k block size, 4k page), but the third buffer on the page (i.e. beyond EOF) is failing drop_buffers because it's bh->b_state == 0x3, which is BH_Uptodate | BH_Dirty. IOWs, there's dirty buffers beyond EOF. Say what? OK, set_buffer_dirty() is called on all buffers from __set_page_buffers_dirty(), regardless of whether the buffer is beyond EOF or not, which means that when we get to ->writepage, we have buffers marked dirty beyond EOF that we need to clean. So, we need to implement our own .set_page_dirty method that doesn't dirty buffers beyond EOF. This is messy because the buffer code is not meant to be shared and it has interesting locking issues on the buffer dirty bits. So just copy and paste it and then modify it to suit what we need. Note: the solutions the other filesystems and generic block code use of marking the buffers clean in ->writepage does not work for XFS. It still leaves dirty buffers beyond EOF and invalidations still fail. Hence rather than play whack-a-mole, this patch simply prevents those buffers from being dirtied in the first place. cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
OpenPOWER on IntegriCloud