blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	drivers: thermal: tsens: Introduce IP-specific max_sensor count	Amit Kucheria	2019-05-14	4	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	The IP can support 'm' sensors while the platform can enable 'n' sensors of the 'm' where n <= m. Track maximum sensors supported by the IP so that we can correctly track what subset of the sensors are supported on the platform. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: change data type for sensor IDs	Amit Kucheria	2019-05-14	1	-2/+2
\| \| \| \| \| \| \|	The IDs cannot be negative, fix the data type. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Add new operation to check if a sensor is enabled	Amit Kucheria	2019-05-14	5	-0/+22
\| \| \| \| \| \| \| \| \|	is_sensor_enabled() checks if the sensors are enabled on this platform. It is possible that the SoC might choose not to enable all the sensors that the IP block is capable of supporting. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER	Amit Kucheria	2019-05-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We print a calibration failure message on -EPROBE_DEFER from nvmem/qfprom as follows: [ 3.003090] qcom-tsens 4a9000.thermal-sensor: version: 1.4 [ 3.005376] qcom-tsens 4a9000.thermal-sensor: tsens calibration failed [ 3.113248] qcom-tsens 4a9000.thermal-sensor: version: 1.4 This confuses people when, in fact, calibration succeeds later when nvmem/qfprom device is available. Don't print this message on a -EPROBE_DEFER. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Save reference to the device pointer and use it	Amit Kucheria	2019-05-14	1	-7/+8
\| \| \| \| \| \| \|	Code cleanup making it easier to read Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Introduce reg_fields to deal with register description	Amit Kucheria	2019-05-14	5	-58/+370
\| \| \| \| \| \| \| \| \| \| \| \| \|	As we add support for newer versions of the TSENS IP, the current approach isn't scaling because registers and bitfields get moved around, requiring platform-specific hacks in the code. By moving to regmap, we can hide the register level differences away from the code. Define a common set of registers and bit-fields that we care about across the various tsens IP versions. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Merge tsens-8974 into tsens-v0_1	Amit Kucheria	2019-05-13	4	-239/+236
\| \| \| \| \| \| \| \|	8974 and 8916 have the same version of the TSENS IP. Merge the files to allow for better code reuse. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Rename constants to prepare to merge with tsens-8974	Amit Kucheria	2019-05-13	1	-44/+44
\| \| \| \| \| \| \| \|	Some #defines in tsens-v_0_1.c clash with those in tsens-8974.c. Prefix them with 8916 to avoid the clash so we can merge the two files. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Rename tsens-8916 to prepare to merge with tsens-8974	Amit Kucheria	2019-05-13	2	-1/+1
\| \| \| \| \| \| \| \|	8916 and 8974 use v0.1.0 of the TSENS IP. Rename tsens-8916 to prepare it for merging with tsens-8974 in a later commit. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Function prototypes should have argument names	Amit Kucheria	2019-05-13	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \|	check_patch complains a lot as follows: WARNING: function definition argument 'struct tsens_priv ' should also have an identifier name + int (init)(struct tsens_priv *); Fix it. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Use consistent names for variables	Amit Kucheria	2019-05-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	tsens_get_temp() uses the name 'data' for the void pointer, use the same in tsens_get_trend() for consistency. Remove a stray space while we're at it. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Rename variable tmdev	Amit Kucheria	2019-05-13	7	-132/+131
\| \| \| \| \| \| \| \| \|	tmdev seems to imply that this is a device pointer when in fact it is just private platform data for each tsens device. Rename it to priv improve code readability. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Rename tsens_device	Amit Kucheria	2019-05-13	7	-35/+35
\| \| \| \| \| \| \| \|	Rename to tsens_priv to denote that it is private data for each tsens instance. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Rename tsens_data	Amit Kucheria	2019-05-13	6	-10/+10
\| \| \| \| \| \| \| \|	Rename to tsens_plat_data to denote that it is platform-data passed in at compile-time. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: tsens: Document the data structures	Amit Kucheria	2019-05-13	1	-3/+28
\| \| \| \| \| \| \| \|	Describe how the TSENS device and the various sensors connected to it are described in the driver Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: stm32: simplify getting .driver_data	Wolfram Sang	2019-05-13	1	-4/+2
\| \| \| \| \| \| \| \| \| \|	We should get 'driver_data' from 'struct device' directly. Going via platform_device is an unneeded step back and forth. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	drivers: thermal: Kconfig: pedantic cleanups	Enrico Weigelt, metux IT consult	2019-05-13	2	-13/+13
\| \| \| \| \| \| \| \|	Formatting of Kconfig files doesn't look so pretty, so just take damp cloth and clean it up. Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	of: thermal: Improve print information	Yangtao Li	2019-05-13	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define pr_fmt macro to add a prefix to the message, this can make the thermal log better recognized. Before: [ 0.602672] nfc: nfc_init: NFC Core ver 0.1 [ 0.602828] NET: Registered protocol family 39 [ 0.603435] clocksource: Switched to clocksource mct-frc [ 0.746216] failed to build thermal zone cpu-thermal: -22 [ 0.746451] NET: Registered protocol family 2 After: [ 0.602804] NET: Registered protocol family 39 [ 0.603463] clocksource: Switched to clocksource mct-frc [ 0.746309] thermal_sys: failed to build thermal zone cpu-thermal: -22 [ 0.746545] NET: Registered protocol family 2 Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: tegra: enable OC hw throttle	Wei Ni	2019-05-13	1	-10/+120
\| \| \| \| \| \| \| \| \|	Parse Over Current settings from DT and program them to generate interrupts. Also enable hw throttling whenever there are OC events. Log the OC events as debug messages. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	of: Add bindings of OC hw throttle for Tegra soctherm	Wei Ni	2019-05-13	1	-0/+25
\| \| \| \| \| \| \| \|	Add OC HW throttle configuration for soctherm in DT. It is used to describe the OCx throttle events. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: tegra: add support for EDP IRQ	Wei Ni	2019-05-13	1	-0/+420
\| \| \| \| \| \| \| \|	Add support to generate OC (over-current) interrupts to indicate the OC event and print out alarm messages. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: tegra: add set_trips functionality	Wei Ni	2019-05-13	5	-5/+90
\| \| \| \| \| \| \|	Implement set_trips ops to set passive trip points. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: tegra: add support for thermal IRQ	Wei Ni	2019-05-13	1	-0/+136
\| \| \| \| \| \| \| \|	Support to generate an interrupt when the temperature crosses a programmed threshold and notify the thermal framework. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: tegra: add support for gpu hw-throttle	Wei Ni	2019-05-13	1	-33/+85
\| \| \| \| \| \| \| \| \| \|	Add support to trigger pulse skippers on the GPU when a HOT trip point is triggered. The pulse skippers can be signalled to throttle at low, medium and high depths\levels. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	of: Add bindings of gpu hw throttle for Tegra soctherm	Wei Ni	2019-05-13	2	-6/+19
\| \| \| \| \| \| \| \|	Add "nvidia,gpu-throt-level" property to set gpu hw throttle level. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	thermal: tegra: support hw and sw shutdown	Wei Ni	2019-05-13	3	-15/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the critical trip points in thermal framework are the only way to specify a temperature at which HW should shutdown. This is insufficient for certain platforms which would want an orderly software shutdown in addition to HW shutdown. This change support to parse "nvidia, thermtrips" property, it allows soctherm DT to specify thermtrip temperatures so that critical trip points framework can be used for doing software shutdown. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	of: Add bindings of thermtrip for Tegra soctherm	Wei Ni	2019-05-13	1	-3/+17
\| \| \| \| \| \| \| \| \|	Add optional property "nvidia,thermtrips". If present, these trips will be used as HW shutdown trips, and critical trips will be used as SW shutdown trips. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
*	Linux 5.1-rc7v5.1-rc7	Linus Torvalds	2019-04-28	1	-1/+1
\|
*	Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm	Linus Torvalds	2019-04-28	4	-6/+20
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull ARM fixes from Russell King: "A small number of ARM fixes - Fix function tracer and unwinder dependencies so that we don't end up building kernels that will crash - Fix ARMv7M nommu initialisation (missing register initialisation) - Fix EFI decompressor entry (ensuring barrier instructions are enabled prior to use)" * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache ARM: 8856/1: NOMMU: Fix CCR register faulty initialization when MPU is disabled ARM: fix function graph tracer and unwinder dependencies
\| *	ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache	Ard Biesheuvel	2019-04-23	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The EFI stub is entered with the caches and MMU enabled by the firmware, and once the stub is ready to hand over to the decompressor, we clean and disable the caches. The cache clean routines use CP15 barrier instructions, which can be disabled via SCTLR. Normally, when using the provided cache handling routines to enable the caches and MMU, this bit is enabled as well. However, but since we entered the stub with the caches already enabled, this routine is not executed before we call the cache clean routines, resulting in undefined instruction exceptions if the firmware never enabled this bit. So set the bit explicitly in the EFI entry code, but do so in a way that guarantees that the resulting code can still run on v6 cores as well (which are guaranteed to have CP15 barriers enabled) Cc: <stable@vger.kernel.org> # v4.9+ Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
\| *	ARM: 8856/1: NOMMU: Fix CCR register faulty initialization when MPU is disabled	Tigran Tadevosyan	2019-04-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CONFIG_ARM_MPU is not defined, the base address of v7M SCB register is not initialized with correct value. This prevents enabling I/D caches when the L1 cache poilcy is applied in kernel. Fixes: 3c24121039c9da14692eb48f6e39565b28c0f3cf ("ARM: 8756/1: NOMMU: Postpone MPU activation till __after_proc_init") Signed-off-by: Tigran Tadevosyan <tigran.tadevosyan@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
\| *	ARM: fix function graph tracer and unwinder dependencies	Russell King	2019-04-23	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Naresh Kamboju recently reported that the function-graph tracer crashes on ARM. The function-graph tracer assumes that the kernel is built with frame pointers. We explicitly disabled the function-graph tracer when building Thumb2, since the Thumb2 ABI doesn't have frame pointers. We recently changed the way the unwinder method was selected, which seems to have made it more likely that we can end up with the function- graph tracer enabled but without the kernel built with frame pointers. Fix up the function graph tracer dependencies so the option is not available when we have no possibility of having frame pointers, and adjust the dependencies on the unwinder option to hide the non-frame pointer unwinder options if the function-graph tracer is enabled. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
* \|	Merge tag 'powerpc-5.1-6' of ↵	Linus Torvalds	2019-04-28	3	-40/+60
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A one-liner to make our Radix MMU support depend on HUGETLB_PAGE. We use some of the hugetlb inlines (eg. pud_huge()) when operating on the linear mapping and if they're compiled into empty wrappers we can corrupt memory. Then two fixes to our VFIO IOMMU code. The first is not a regression but fixes the locking to avoid a user-triggerable deadlock. The second does fix a regression since rc1, and depends on the first fix. It makes it possible to run guests with large amounts of memory again (~256GB). Thanks to Alexey Kardashevskiy" * tag 'powerpc-5.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm_iommu: Allow pinning large regions powerpc/mm_iommu: Fix potential deadlock powerpc/mm/radix: Make Radix require HUGETLB_PAGE
\| * \|	powerpc/mm_iommu: Allow pinning large regions	Alexey Kardashevskiy	2019-04-17	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When called with vmas_arg==NULL, get_user_pages_longterm() allocates an array of nr_pages*8 which can easily get greater that the max order, for example, registering memory for a 256GB guest does this and fails in __alloc_pages_nodemask(). This adds a loop over chunks of entries to fit the max order limit. Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \|	powerpc/mm_iommu: Fix potential deadlock	Alexey Kardashevskiy	2019-04-17	1	-36/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently mm_iommu_do_alloc() is called in 2 cases: - VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory: this locks &mem_list_mutex and then locks mm::mmap_sem several times when adjusting locked_vm or pinning pages; - vfio_pci_nvgpu_regops::mmap() for GPU memory: this is called with mm::mmap_sem held already and it locks &mem_list_mutex. So one can craft a userspace program to do special ioctl and mmap in 2 threads concurrently and cause a deadlock which lockdep warns about (below). We did not hit this yet because QEMU constructs the machine in a single thread. This moves the overlap check next to where the new entry is added and reduces the amount of time spent with &mem_list_mutex held. This moves locked_vm adjustment from under &mem_list_mutex. This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0. This is one of the lockdep warnings: ====================================================== WARNING: possible circular locking dependency detected 5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted ------------------------------------------------------ qemu-system-ppc/8038 is trying to acquire lock: 000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490 but task is already holding lock: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++}: lock_acquire+0xf8/0x260 down_write+0x44/0xa0 mm_iommu_adjust_locked_vm.part.1+0x4c/0x190 mm_iommu_do_alloc+0x310/0x490 tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce] vfio_fops_unl_ioctl+0x94/0x430 [vfio] do_vfs_ioctl+0xe4/0x930 ksys_ioctl+0xc4/0x110 sys_ioctl+0x28/0x80 system_call+0x5c/0x70 -> #0 (mem_list_mutex){+.+.}: __lock_acquire+0x1484/0x1900 lock_acquire+0xf8/0x260 __mutex_lock+0x88/0xa70 mm_iommu_do_alloc+0x70/0x490 vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci] vfio_pci_mmap+0x198/0x2a0 [vfio_pci] vfio_device_fops_mmap+0x44/0x70 [vfio] mmap_region+0x5d4/0x770 do_mmap+0x42c/0x650 vm_mmap_pgoff+0x124/0x160 ksys_mmap_pgoff+0xdc/0x2f0 sys_mmap+0x40/0x80 system_call+0x5c/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(mem_list_mutex); lock(&mm->mmap_sem); lock(mem_list_mutex); * DEADLOCK * 1 lock held by qemu-system-ppc/8038: #0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160 Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \|	powerpc/mm/radix: Make Radix require HUGETLB_PAGE	Michael Ellerman	2019-04-17	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Joel reported weird crashes using skiroot_defconfig, in his case we jumped into an NX page: kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc000000002bff4f0 Looking at the disassembly, we had simply branched to that address: c000000000c001bc 49fff335 bl c000000002bff4f0 But that didn't match the original kernel image: c000000000c001bc 4bfff335 bl c000000000bff4f0 <kobject_get+0x8> When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we call radix__change_memory_range() late in boot to change page protections. We do that both to mark rodata read only and also to mark init text no-execute. That involves walking the kernel page tables, and clearing _PAGE_WRITE or _PAGE_EXEC respectively. With radix we may use hugepages for the linear mapping, so the code in radix__change_memory_range() uses eg. pmd_huge() to test if it has found a huge mapping, and if so it stops the page table walk and changes the PMD permissions. However if the kernel is built without HUGETLBFS support, pmd_huge() is just a #define that always returns 0. That causes the code in radix__change_memory_range() to incorrectly interpret the PMD value as a pointer to a PTE page rather than as a PTE at the PMD level. We can see this using `dv` in xmon which also uses pmd_huge(): 0:mon> dv c000000000000000 pgd @ 0xc000000001740000 pgdp @ 0xc000000001740000 = 0x80000000ffffb009 pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009 pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f <- this is a PTE ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d <- kernel text The end result is we treat the value at 0xc000000000000100 as a PTE and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code at that address. In Joel's specific case we cleared the sign bit in the offset of the branch, causing a backward branch to turn into a forward branch which caused us to branch into a non-executable page. However the exact nature of the crash depends on kernel version, compiler version, and other factors. We need to fix radix__change_memory_range() to not use accessors that depend on HUGETLBFS, but we also have radix memory hotplug code that uses pmd_huge() etc that will also need fixing. So for now just disallow the broken combination of Radix with HUGETLBFS disabled. The only defconfig we have that is affected is skiroot_defconfig, so turn on HUGETLBFS there so that it still gets Radix. Fixes: 566ca99af026 ("powerpc/mm/radix: Add dummy radix_enabled()") Cc: stable@vger.kernel.org # v4.7+ Reported-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* \| \|	Merge tag 'for-linus-20190428' of git://git.kernel.dk/linux-block	Linus Torvalds	2019-04-28	1	-16/+26
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull block fixes from Jens Axboe: "A set of io_uring fixes that should go into this release. In particular, this contains: - The mutex lock vs ctx ref count fix (me) - Removal of a dead variable (me) - Two race fixes (Stefan) - Ring head/tail condition fix for poll full SQ detection (Stefan)" * tag 'for-linus-20190428' of git://git.kernel.dk/linux-block: io_uring: remove 'state' argument from io_{read,write} path io_uring: fix poll full SQ detection io_uring: fix race condition when sq threads goes sleeping io_uring: fix race condition reading SQ entries io_uring: fail io_uring_register(2) on a dying io_uring instance
\| * \| \|	io_uring: remove 'state' argument from io_{read,write} path	Jens Axboe	2019-04-23	1	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 09bb839434b we don't use the state argument for any sort of on-stack caching in the io read and write path. Remove the stale and unused argument from them, and bubble it up to __io_submit_sqe() and down to io_prep_rw(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \|	io_uring: fix poll full SQ detection	Stefan Bühler	2019-04-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	io_uring_poll shouldn't signal EPOLLOUT \| EPOLLWRNORM if the queue is full; the old check would always signal EPOLLOUT \| EPOLLWRNORM (unless there were U32_MAX - 1 entries in the SQ queue). Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \|	io_uring: fix race condition when sq threads goes sleeping	Stefan Bühler	2019-04-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reading the SQ tail needs to come after setting IORING_SQ_NEED_WAKEUP in flags; there is no cheap barrier for ordering a store before a load, a full memory barrier is required. Userspace needs a full memory barrier between updating SQ tail and checking for the IORING_SQ_NEED_WAKEUP too. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \|	io_uring: fix race condition reading SQ entries	Stefan Bühler	2019-04-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A read memory barrier is required between reading SQ tail and reading the actual data belonging to the SQ entry. Userspace needs a matching write barrier between writing SQ entries and updating SQ tail (using smp_store_release to update tail will do). Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \|	io_uring: fail io_uring_register(2) on a dying io_uring instance	Jens Axboe	2019-04-22	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have multiple threads doing io_uring_register(2) on an io_uring fd, then we can potentially try and kill the percpu reference while someone else has already killed it. Prevent this race by failing io_uring_register(2) if the ref is marked dying. This is safe since we're inside the io_uring mutex. Fixes: b19062a56726 ("io_uring: fix possible deadlock between io_uring_{enter,register}") Reported-by: syzbot <syzbot+10d25e23199614b7721f@syzkaller.appspotmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* \| \| \|	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma	Linus Torvalds	2019-04-28	7	-20/+76
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull rdma fixes from Jason Gunthorpe: "One core bug fix and a few driver ones - FRWR memory registration for hfi1/qib didn't work with with some iovas causing a NFSoRDMA failure regression due to a fix in the NFS side - A command flow error in mlx5 allowed user space to send a corrupt command (and also smash the kernel stack we've since learned) - Fix a regression and some bugs with device hot unplug that was discovered while reviewing Andrea's patches - hns has a failure if the user asks for certain QP configurations" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Bugfix for mapping user db RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page IB/mlx5: Fix scatter to CQE in DCT QP creation IB/rdmavt: Fix frwr memory registration
\| * \| \| \|	RDMA/hns: Bugfix for mapping user db	Lijun Ou	2019-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the maximum send wr delivered by the user is zero, the qp does not have a sq. When allocating the sq db buffer to store the user sq pi pointer and map it to the kernel mode, max_send_wr is used as the trigger condition, while the kernel does not consider the max_send_wr trigger condition when mapmping db. It will cause sq record doorbell map fail and create qp fail. The failed print information as follows: hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000 hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff hns3 0000:7d:00.1: sq record doorbell map failed! hns3 0000:7d:00.1: Create RC QP failed Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
\| * \| \| \|	RDMA/ucontext: Fix regression with disassociate	Jason Gunthorpe	2019-04-24	2	-3/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When this code was consolidated the intention was that the VMA would become backed by anonymous zero pages after the zap_vma_pte - however this very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED bits to transform the VMA into an anonymous VMA. Since the vm_ops was removed this broke. Now userspace gets a SIGBUS if it touches the vma after disassociation. Instead of converting the VMA to anonymous provide a fault handler that puts a zero'd page into the VMA when user-space touches it after disassociation. Cc: stable@vger.kernel.org Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
\| * \| \| \|	RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages	Jason Gunthorpe	2019-04-24	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since mlx5 supports device disassociate it must use this API for all BAR page mmaps, otherwise the pages can remain mapped after the device is unplugged causing a system crash. Cc: stable@vger.kernel.org Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
\| * \| \| \|	RDMA/mlx5: Do not allow the user to write to the clock page	Jason Gunthorpe	2019-04-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intent of this VMA was to be read-only from user space, but the VM_MAYWRITE masking was missed, so mprotect could make it writable. Cc: stable@vger.kernel.org Fixes: 5c99eaecb1fc ("IB/mlx5: Mmap the HCA's clock info to user-space") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
\| * \| \| \|	IB/mlx5: Fix scatter to CQE in DCT QP creation	Guy Levi	2019-04-18	3	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command since it tried to treat it as as QP create mailbox command instead of a DCT create command. The corrupted mailbox command causes userspace to malfunction as the device doesn't create the QP as expected. A new mlx5 capability is exposed to user-space which ensures that it will not enable the feature on DCT without this fix in the kernel. Fixes: 5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
\| * \| \| \|	IB/rdmavt: Fix frwr memory registration	Josh Collier	2019-04-16	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* \| \| \| \|	Merge tag 'dmaengine-fix-5.1-rc7' of ↵	Linus Torvalds	2019-04-28	3	-6/+28
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: - fix for wrong register use in mediatek driver - fix in sh driver for glitch is tx_status and treating 0 a valid residue for cyclic - fix in bcm driver for using right memory allocation flag * tag 'dmaengine-fix-5.1-rc7' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: mediatek-cqdma: fix wrong register usage in mtk_cqdma_start dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid dmaengine: bcm2835: Avoid GFP_KERNEL in device_prep_slave_sg