From 3f70dc38cec2ad6e5355f80c4c7a15a3f7e97a19 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 7 Oct 2016 16:59:06 -0700 Subject: mm: make sure that kthreads will not refault oom reaped memory There are only few use_mm() users in the kernel right now. Most of them write to the target memory but vhost driver relies on copy_from_user/get_user from a kernel thread context. This makes it impossible to reap the memory of an oom victim which shares the mm with the vhost kernel thread because it could see a zero page unexpectedly and theoretically make an incorrect decision visible outside of the killed task context. To quote Michael S. Tsirkin: : Getting an error from __get_user and friends is handled gracefully. : Getting zero instead of a real value will cause userspace : memory corruption. The vhost kernel thread is bound to an open fd of the vhost device which is not tight to the mm owner life cycle in general. The device fd can be inherited or passed over to another process which means that we really have to be careful about unexpected memory corruption because unlike for normal oom victims the result will be visible outside of the oom victim context. Make sure that no kthread context (users of use_mm) can ever see corrupted data because of the oom reaper and hook into the page fault path by checking MMF_UNSTABLE mm flag. __oom_reap_task_mm will set the flag before it starts unmapping the address space while the flag is checked after the page fault has been handled. If the flag is set then SIGBUS is triggered so any g-u-p user will get a error code. Regular tasks do not need this protection because all which share the mm are killed when the mm is reaped and so the corruption will not outlive them. This patch shouldn't have any visible effect at this moment because the OOM killer doesn't invoke oom reaper for tasks with mm shared with kthreads yet. Link: http://lkml.kernel.org/r/1472119394-11342-9-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: "Michael S. Tsirkin" Cc: Tetsuo Handa Cc: Oleg Nesterov Cc: David Rientjes Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'mm/memory.c') diff --git a/mm/memory.c b/mm/memory.c index f1a68049edff..4bfc3a9c3b18 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3658,6 +3658,19 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mem_cgroup_oom_synchronize(false); } + /* + * This mm has been already reaped by the oom reaper and so the + * refault cannot be trusted in general. Anonymous refaults would + * lose data and give a zero page instead e.g. This is especially + * problem for use_mm() because regular tasks will just die and + * the corrupted data will not be visible anywhere while kthread + * will outlive the oom victim and potentially propagate the date + * further. + */ + if (unlikely((current->flags & PF_KTHREAD) && !(ret & VM_FAULT_ERROR) + && test_bit(MMF_UNSTABLE, &vma->vm_mm->flags))) + ret = VM_FAULT_SIGBUS; + return ret; } EXPORT_SYMBOL_GPL(handle_mm_fault); -- cgit v1.2.3 From 87744ab3832b83ba71b931f86f9cfdb000d07da5 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 7 Oct 2016 17:00:18 -0700 Subject: mm: fix cache mode tracking in vm_insert_mixed() vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(), fails to check the pgprot_t it uses for the mapping against the one recorded in the memtype tracking tree. Add the missing call to track_pfn_insert() to preclude cases where incompatible aliased mappings are established for a given physical address range. Link: http://lkml.kernel.org/r/147328717909.35069.14256589123570653697.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Cc: David Airlie Cc: Matthew Wilcox Cc: Ross Zwisler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'mm/memory.c') diff --git a/mm/memory.c b/mm/memory.c index 4bfc3a9c3b18..fc1987dfd8cc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1649,10 +1649,14 @@ EXPORT_SYMBOL(vm_insert_pfn_prot); int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, pfn_t pfn) { + pgprot_t pgprot = vma->vm_page_prot; + BUG_ON(!(vma->vm_flags & VM_MIXEDMAP)); if (addr < vma->vm_start || addr >= vma->vm_end) return -EFAULT; + if (track_pfn_insert(vma, &pgprot, pfn)) + return -EINVAL; /* * If we don't have pte special, then we have to use the pfn_valid() @@ -1670,9 +1674,9 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, * result in pfn_t_has_page() == false. */ page = pfn_to_page(pfn_t_to_pfn(pfn)); - return insert_page(vma, addr, page, vma->vm_page_prot); + return insert_page(vma, addr, page, pgprot); } - return insert_pfn(vma, addr, pfn, vma->vm_page_prot); + return insert_pfn(vma, addr, pfn, pgprot); } EXPORT_SYMBOL(vm_insert_mixed); -- cgit v1.2.3