summaryrefslogtreecommitdiffstats
path: root/fs/proc/base.c
Commit message (Collapse)AuthorAgeFilesLines
* proc, oom: do not report alien mms when setting oom_score_adjMichal Hocko2019-02-211-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tetsuo has reported that creating a thousands of processes sharing MM without SIGHAND (aka alien threads) and setting /proc/<pid>/oom_score_adj will swamp the kernel log and takes ages [1] to finish. This is especially worrisome that all that printing is done under RCU lock and this can potentially trigger RCU stall or softlockup detector. The primary reason for the printk was to catch potential users who might depend on the behavior prior to 44a70adec910 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj") but after more than 2 years without a single report I guess it is safe to simply remove the printk altogether. The next step should be moving oom_score_adj over to the mm struct and remove all the tasks crawling as suggested by [2] [1] http://lkml.kernel.org/r/97fce864-6f75-bca5-14bc-12c9f890e740@i-love.sakura.ne.jp [2] http://lkml.kernel.org/r/20190117155159.GA4087@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20190212102129.26288-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/base.c: slightly faster /proc/*/limitsAlexey Dobriyan2019-01-041-2/+4
| | | | | | | | | | | Header of /proc/*/limits is a fixed string, so print it directly without formatting specifiers. Link: http://lkml.kernel.org/r/20181203164242.GB6904@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/base.c: use ns_capable instead of capable for timerslack_nsBenjamin Gordon2019-01-041-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Access to timerslack_ns is controlled by a process having CAP_SYS_NICE in its effective capability set, but the current check looks in the root namespace instead of the process' user namespace. Since a process is allowed to do other activities controlled by CAP_SYS_NICE inside a namespace, it should also be able to adjust timerslack_ns. Link: http://lkml.kernel.org/r/20181030180012.232896-1-bmgordon@google.com Signed-off-by: Benjamin Gordon <bmgordon@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: John Stultz <john.stultz@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Todd Kjos <tkjos@google.com> Cc: Colin Cross <ccross@android.com> Cc: Nick Kralevich <nnk@google.com> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Elliott Hughes <enh@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: convert totalram_pages and totalhigh_pages variables to atomicArun KS2018-12-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'stackleak-v4.20-rc1' of ↵Linus Torvalds2018-11-011-0/+18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull stackleak gcc plugin from Kees Cook: "Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin was ported from grsecurity by Alexander Popov. It provides efficient stack content poisoning at syscall exit. This creates a defense against at least two classes of flaws: - Uninitialized stack usage. (We continue to work on improving the compiler to do this in other ways: e.g. unconditional zero init was proposed to GCC and Clang, and more plugin work has started too). - Stack content exposure. By greatly reducing the lifetime of valid stack contents, exposures via either direct read bugs or unknown cache side-channels become much more difficult to exploit. This complements the existing buddy and heap poisoning options, but provides the coverage for stacks. The x86 hooks are included in this series (which have been reviewed by Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already been merged through the arm64 tree (written by Laura Abbott and reviewed by Mark Rutland and Will Deacon). With VLAs having been removed this release, there is no need for alloca() protection, so it has been removed from the plugin" * tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: Drop unneeded stackleak_check_alloca() stackleak: Allow runtime disabling of kernel stack erasing doc: self-protection: Add information about STACKLEAK feature fs/proc: Show STACKLEAK metrics in the /proc file system lkdtm: Add a test for STACKLEAK gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
| * fs/proc: Show STACKLEAK metrics in the /proc file systemAlexander Popov2018-09-041-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce CONFIG_STACKLEAK_METRICS providing STACKLEAK information about tasks via the /proc file system. In particular, /proc/<pid>/stack_depth shows the maximum kernel stack consumption for the current and previous syscalls. Although this information is not precise, it can be useful for estimating the STACKLEAK performance impact for your workloads. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexander Popov <alex.popov@linux.com> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>
* | proc: restrict kernel stack dumps to rootJann Horn2018-10-051-0/+14
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, you can use /proc/self/task/*/stack to cause a stack walk on a task you control while it is running on another CPU. That means that the stack can change under the stack walker. The stack walker does have guards against going completely off the rails and into random kernel memory, but it can interpret random data from your kernel stack as instruction pointers and stack pointers. This can cause exposure of kernel stack contents to userspace. Restrict the ability to inspect kernel stacks of arbitrary tasks to root in order to prevent a local attacker from exploiting racy stack unwinding to leak kernel task stack contents. See the added comment for a longer rationale. There don't seem to be any users of this userspace API that can't gracefully bail out if reading from the file fails. Therefore, I believe that this change is unlikely to break things. In the case that this patch does end up needing a revert, the next-best solution might be to fake a single-entry stack based on wchan. Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* proc: use macro in /proc/latency hookAlexey Dobriyan2018-08-221-1/+1
| | | | | | | | | | | | | | ->latency_record is defined as struct latency_record[LT_SAVECOUNT]; so use the same macro whie iterating. Link: http://lkml.kernel.org/r/20180627200534.GA18434@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: save 2 atomic ops on write to "/proc/*/attr/*"Alexey Dobriyan2018-08-221-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Code checks if write is done by current to its own attributes. For that get/put pair is unnecessary as it can be done under RCU. Note: rcu_read_unlock() can be done even earlier since pointer to a task is not dereferenced. It depends if /proc code should look scary or not: rcu_read_lock(); task = pid_task(...); rcu_read_unlock(); if (!task) return -ESRCH; if (task != current) return -EACCESS: P.S.: rename "length" variable. Code like this length = -EINVAL; should not exist. Link: http://lkml.kernel.org/r/20180627200218.GF18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: put task earlier in /proc/*/fail-nthAlexey Dobriyan2018-08-221-3/+1
| | | | | | | | Link: http://lkml.kernel.org/r/20180627195427.GE18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: /proc/pid/*maps remove is_pid and related wrappersVlastimil Babka2018-08-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "cleanups and refactor of /proc/pid/smaps*". The recent regression in /proc/pid/smaps made me look more into the code. Especially the issues with smaps_rollup reported in [1] as explained in Patch 4, which fixes them by refactoring the code. Patches 2 and 3 are preparations for that. Patch 1 is me realizing that there's a lot of boilerplate left from times where we tried (unsuccessfuly) to mark thread stacks in the output. Originally I had also plans to rework the translation from /proc/pid/*maps* file offsets to the internal structures. Now the offset means "vma number", which is not really stable (vma's can come and go between read() calls) and there's an extra caching of last vma's address. My idea was that offsets would be interpreted directly as addresses, which would also allow meaningful seeks (see the ugly seek_to_smaps_entry() in tools/testing/selftests/vm/mlock2.h). However loff_t is (signed) long long so that might be insufficient somewhere for the unsigned long addresses. So the result is fixed issues with skewed /proc/pid/smaps_rollup results, simpler smaps code, and a lot of unused code removed. [1] https://marc.info/?l=linux-mm&m=151927723128134&w=2 This patch (of 4): Commit b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") introduced differences between /proc/PID/maps and /proc/PID/task/TID/maps to mark thread stacks properly, and this was also done for smaps and numa_maps. However it didn't work properly and was ultimately removed by commit b18cb64ead40 ("fs/proc: Stop trying to report thread stacks"). Now the is_pid parameter for the related show_*() functions is unused and we can remove it together with wrapper functions and ops structures that differ for PID and TID cases only in this parameter. Link: http://lkml.kernel.org/r/20180723111933.15443-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: fix missing final NUL in get_mm_cmdline() rewriteLinus Torvalds2018-06-201-5/+23
| | | | | | | | | | | | | | | | | | | | | The rewrite of the cmdline fetching missed the fact that we used to also return the final terminating NUL character of the last argument. I hadn't noticed, and none of the tools I tested cared, but something obviously must care, because Michal Kubecek noticed the change in behavior. Tweak the "find the end" logic to actually include the NUL character, and once past the eend of argv, always start the strnlen() at the expected (original) argument end. This whole "allow people to rewrite their arguments in place" is a nasty hack and requires that odd slop handling at the end of the argv array, but it's our traditional model, so we continue to support it. Repored-and-bisected-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-and-tested-by: Michal Kubecek <mkubecek@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: skip branch in /proc/*/* lookupAlexey Dobriyan2018-06-151-6/+3
| | | | | | | | | | | | | | | | | | | | | | | Code is structured like this: for ( ... p < last; p++) { if (memcmp == 0) break; } if (p >= last) ERROR OK gcc doesn't see that if if lookup succeeds than post loop branch will never be taken and skip it. [akpm@linux-foundation.org: proc_pident_instantiate() no longer takes an inode*] Link: http://lkml.kernel.org/r/20180423213954.GD9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* treewide: kmalloc() -> kmalloc_array()Kees Cook2018-06-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* Merge branch 'proc-cmdline'Linus Torvalds2018-06-091-112/+99
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge proc_cmdline simplifications. This re-writes the get_mm_cmdline() logic to be rather simpler than it used to be, and makes the semantics for "cmdline goes past the end of the original area" more natural. You _can_ use prctl(PR_SET_MM) to just point your command line somewhere else entirely, but the traditional model is to just edit things in place and that still needs to continue to work. At least this way the code makes some sense. * proc-cmdline: fs/proc: simplify and clarify get_mm_cmdline() function fs/proc: re-factor proc_pid_cmdline_read() a bit
| * fs/proc: simplify and clarify get_mm_cmdline() functionLinus Torvalds2018-05-171-121/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have some very odd semantics for reading the command line through /proc, because we allow people to rewrite their own command line pretty much at will, and things get positively funky when you extend your command line past the point that used to be the end of the command line, and is now in the environment variable area. But our weird semantics doesn't mean that we should write weird and complex code to handle them. So re-write get_mm_cmdline() to be much simpler, and much more explicit about what it is actually doing and why. And avoid the extra check for "is there a NUL character at the end of the command line where I expect one to be", by simply making the NUL character handling be part of the normal "once you hit the end of the command line, stop at the first NUL character" logic. It's quite possible that we should stop the crazy "walk into environment" entirely, but happily it's not really the usual case. NOTE! We tried to really simplify and limit our odd cmdline parsing some time ago, but people complained. See commit c2c0bb44620d ("proc: fix PAGE_SIZE limit of /proc/$PID/cmdline") for details about why we have this complexity. Cc: Tejun Heo <tj@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * fs/proc: re-factor proc_pid_cmdline_read() a bitLinus Torvalds2018-05-171-25/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a pure refactoring of the function, preparing for some further cleanups. The thing was pretty illegible, and the core functionality still is, but now the core loop is a bit more isolated from the thing that goes on around it. This was "inspired" by the confluence of kworker workqueue name cleanups by Tejun, currently scheduled for 4.18, and commit 7f7ccc2ccc2e ("proc: do not access cmdline nor environ from file-backed areas"). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'work.lookup' of ↵Linus Torvalds2018-06-081-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull proc_fill_cache regression fix from Al Viro: "Regression fix for proc_fill_cache() braino introduced when switching instantiate() callback to d_splice_alias()" * 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix proc_fill_cache() in case of d_alloc_parallel() failure
| * | fix proc_fill_cache() in case of d_alloc_parallel() failureAl Viro2018-06-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If d_alloc_parallel() returns ERR_PTR(...), we don't want to dput() that. Small reorganization allows to have all error-in-lookup cases rejoin the main codepath after dput(child), avoiding the entire problem. Spotted-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: 0168b9e38c42 "procfs: switch instantiate_t to d_splice_alias()" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | proc: use "unsigned int" for /proc/*/stackAlexey Dobriyan2018-06-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct stack_trace::nr_entries is defined as "unsigned int" (YAY!) so the iterator should be unsigned as well. It saves 1 byte of code or something like that. Link: http://lkml.kernel.org/r/20180423215248.GG9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | proc: use "unsigned int" in proc_fill_cache()Alexey Dobriyan2018-06-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All those lengths are unsigned as they should be. Link: http://lkml.kernel.org/r/20180423213751.GC9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | proc: smaller RCU section in ->getattr()Alexey Dobriyan2018-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct kstat is thread local. Link: http://lkml.kernel.org/r/20180423213626.GB9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | proc: deduplicate /proc/*/cmdline implementationAlexey Dobriyan2018-06-071-33/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code can be sonsolidated if a dummy region of 0 length is used in normal case of \0-separated command line: 1) [arg_start, arg_end) + [dummy len=0] 2) [arg_start, arg_end) + [env_start, env_end) Link: http://lkml.kernel.org/r/20180221193335.GB28678@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | proc: simpler iterations for /proc/*/cmdlineAlexey Dobriyan2018-06-071-19/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "rv" variable is used both as a counter of bytes transferred and an error value holder but it can be reduced solely to error values if original start of userspace buffer is stashed and used at the very end. [akpm@linux-foundation.org: simplify cleanup code] Link: http://lkml.kernel.org/r/20180221193009.GA28678@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | proc: somewhat simpler code for /proc/*/cmdlineAlexey Dobriyan2018-06-071-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "final" variable is OK but we can get away with less lines. Link: http://lkml.kernel.org/r/20180221192751.GC28548@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | proc: more "unsigned int" in /proc/*/cmdlineAlexey Dobriyan2018-06-071-17/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | access_remote_vm() doesn't return negative errors, it returns number of bytes read/written (0 if error occurs). This allows to delete some comparisons which never trigger. Reuse "nr_read" variable while I'm at it. Link: http://lkml.kernel.org/r/20180221192605.GB28548@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_structYang Shi2018-06-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmap_sem is on the hot path of kernel, and it very contended, but it is abused too. It is used to protect arg_start|end and evn_start|end when reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make sense since those proc files just expect to read 4 values atomically and not related to VM, they could be set to arbitrary values by C/R. And, the mmap_sem contention may cause unexpected issue like below: INFO: task ps:14018 blocked for more than 120 seconds. Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D 0 14018 1 0x00000004 Call Trace: schedule+0x36/0x80 rwsem_down_read_failed+0xf0/0x150 call_rwsem_down_read_failed+0x18/0x30 down_read+0x20/0x40 proc_pid_cmdline_read+0xd9/0x4e0 __vfs_read+0x37/0x150 vfs_read+0x96/0x130 SyS_read+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xc5 Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock for them to mitigate the abuse of mmap_sem. So, introduce a new spinlock in mm_struct to protect the concurrent access to arg_start|end, env_start|end and others, as well as replace write map_sem to read to protect the race condition between prctl and sys_brk which might break check_data_rlimit(), and makes prctl more friendly to other VM operations. This patch just eliminates the abuse of mmap_sem, but it can't resolve the above hung task warning completely since the later access_remote_vm() call needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the future. [yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock] Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2018-06-051-3/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull workqueue updates from Tejun Heo: - make kworkers report the workqueue it is executing or has executed most recently in /proc/PID/comm (so they show up in ps/top) - CONFIG_SMP shuffle to move stuff which isn't necessary for UP builds inside CONFIG_SMP. * 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: move function definitions within CONFIG_SMP block workqueue: Make sure struct worker is accessible for wq_worker_comm() workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status} proc: Consolidate task->comm formatting into proc_task_name() workqueue: Set worker->desc to workqueue name by default workqueue: Make worker_attach/detach_pool() update worker->pool workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex
| * | | proc: Consolidate task->comm formatting into proc_task_name()Tejun Heo2018-05-181-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | proc shows task->comm in three places - comm, stat, status - and each is fetching and formatting task->comm slighly differently. This patch renames task_name() to proc_task_name(), makes it more generic, and updates all three paths to use it. This will enable expanding comm reporting for workqueue workers. Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | Merge branch 'work.lookup' of ↵Linus Torvalds2018-06-041-75/+61
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache lookup cleanups from Al Viro: "Cleaning ->lookup() instances up - mostly d_splice_alias() conversions" * 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (29 commits) switch the rest of procfs lookups to d_splice_alias() procfs: switch instantiate_t to d_splice_alias() don't bother with tid_fd_revalidate() in lookups proc_lookupfd_common(): don't bother with instantiate unless the file is open procfs: get rid of ancient BS in pid_revalidate() uses cifs_lookup(): switch to d_splice_alias() cifs_lookup(): cifs_get_inode_...() never returns 0 with *inode left NULL 9p: unify paths in v9fs_vfs_lookup() ncp_lookup(): use d_splice_alias() hfsplus: switch to d_splice_alias() hfs: don't allow mounting over .../rsrc hfs: use d_splice_alias() omfs_lookup(): report IO errors, use d_splice_alias() orangefs_lookup: simplify openpromfs: switch to d_splice_alias() xfs_vn_lookup: simplify a bit adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias() adfs_lookup_byname: .. *is* taken care of in fs/namei.c romfs_lookup: switch to d_splice_alias() qnx6_lookup: switch to d_splice_alias() ...
| * | | procfs: switch instantiate_t to d_splice_alias()Al Viro2018-05-261-50/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and get rid of pointless struct inode *dir argument of those, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | procfs: get rid of ancient BS in pid_revalidate() usesAl Viro2018-05-221-31/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First of all, calling pid_revalidate() in the end of <pid>/* lookups is *not* about closing any kind of races; that used to be true once upon a time, but these days those comments are actively misleading. Especially since pid_revalidate() doesn't even do d_drop() on failure anymore. It doesn't matter, anyway, since once pid_revalidate() starts returning false, ->d_delete() of those dentries starts saying "don't keep"; they won't get stuck in dcache any longer than they are pinned. These calls cannot be just removed, though - the side effect of pid_revalidate() (updating i_uid/i_gid/etc.) is what we are calling it for here. Let's separate the "update ownership" into a new helper (pid_update_inode()) and use it, both in lookups and in pid_revalidate() itself. The comments in pid_revalidate() are also out of date - they refer to the time when pid_revalidate() used to call d_drop() directly... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge branch 'hch.procfs' of ↵Linus Torvalds2018-06-041-10/+8
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
| * | | proc: introduce a proc_pid_ns helperChristoph Hellwig2018-05-161-10/+8
| | |/ | |/| | | | | | | | | | | | | | | | | | | Factor out retrieving the per-sb pid namespaces from the sb private data into an easier to understand helper. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | proc: do not access cmdline nor environ from file-backed areasWilly Tarreau2018-05-171-4/+4
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | proc_pid_cmdline_read() and environ_read() directly access the target process' VM to retrieve the command line and environment. If this process remaps these areas onto a file via mmap(), the requesting process may experience various issues such as extra delays if the underlying device is slow to respond. Let's simply refuse to access file-backed areas in these functions. For this we add a new FOLL_ANON gup flag that is passed to all calls to access_remote_vm(). The code already takes care of such failures (including unmapped areas). Accesses via /proc/pid/mem were not changed though. This was assigned CVE-2018-1120. Note for stable backports: the patch may apply to kernels prior to 4.11 but silently miss one location; it must be checked that no call to access_remote_vm() keeps zero as the last argument. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | proc: revalidate kernel thread inodes to root:rootAlexey Dobriyan2018-04-201-0/+6
|/ | | | | | | | | | | | | | | | | | | | task_dump_owner() has the following code: mm = task->mm; if (mm) { if (get_dumpable(mm) != SUID_DUMP_USER) { uid = ... } } Check for ->mm is buggy -- kernel thread might be borrowing mm and inode will go to some random uid:gid pair. Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: do mmput ASAP for /proc/*/map_filesAlexey Dobriyan2018-04-111-1/+1
| | | | | | | | | | | mm_struct is not needed while printing as all the data was already extracted. Link: http://lkml.kernel.org/r/20180309223120.GC3843@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: fix /proc/*/map_files lookup some moreAlexey Dobriyan2018-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | I totally forgot that _parse_integer() accepts arbitrary amount of leading zeroes leading to the following lookups: OK # readlink /proc/1/map_files/56427ecba000-56427eddc000 /lib/systemd/systemd bogus # readlink /proc/1/map_files/00000000000056427ecba000-56427eddc000 /lib/systemd/systemd # readlink /proc/1/map_files/56427ecba000-00000000000056427eddc000 /lib/systemd/systemd Link: http://lkml.kernel.org/r/20180303215130.GA23480@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: use set_puts() at /proc/*/wchanAlexey Dobriyan2018-04-111-1/+1
| | | | | | | | | | Link: http://lkml.kernel.org/r/20180217072011.GB16074@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: check permissions earlier for /proc/*/wchanAlexey Dobriyan2018-04-111-5/+8
| | | | | | | | | | | | | get_wchan() accesses stack page before permissions are checked, let's not play this game. Link: http://lkml.kernel.org/r/20180217071923.GA16074@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* procfs: remove CONFIG_HARDWALL dependencyArnd Bergmann2018-03-161-9/+0
| | | | | | | | Hardwall is a tile specific feature, and with the removal of the tile architecture, this has become dead code, so let's remove it. Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* fs/proc: use __ro_after_initAlexey Dobriyan2018-02-061-2/+3
| | | | | | | | | | /proc/self inode numbers, value of proc_inode_cache and st_nlink of /proc/$TGID are fixed constants. Link: http://lkml.kernel.org/r/20180103184707.GA31849@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: less memory for /proc/*/map_files readdirAlexey Dobriyan2018-02-061-6/+9
| | | | | | | | | | | dentry name can be evaluated later, right before calling into VFS. Also, spend less time under ->mmap_sem. Link: http://lkml.kernel.org/r/20171110163034.GA2534@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: fix /proc/*/map_files lookupAlexey Dobriyan2018-02-061-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code does: if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2) However sscanf() is broken garbage. It silently accepts whitespace between format specifiers (did you know that?). It silently accepts valid strings which result in integer overflow. Do not use sscanf() for any even remotely reliable parsing code. OK # readlink '/proc/1/map_files/55a23af39000-55a23b05b000' /lib/systemd/systemd broken # readlink '/proc/1/map_files/ 55a23af39000-55a23b05b000' /lib/systemd/systemd broken # readlink '/proc/1/map_files/55a23af39000-55a23b05b000 ' /lib/systemd/systemd very broken # readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000' /lib/systemd/systemd Andrei said: : This patch breaks criu. It was a bug in criu. And this bug is on a minor : path, which works when memfd_create() isn't available. It is a reason why : I ask to not backport this patch to stable kernels. : : In CRIU this bug can be triggered, only if this patch will be backported : to a kernel which version is lower than v3.16. Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: don't use READ_ONCE/WRITE_ONCE for /proc/*/fail-nthAlexey Dobriyan2018-02-061-3/+2
| | | | | | | | | | | READ_ONCE and WRITE_ONCE are useless when there is only one read/write is being made. Link: http://lkml.kernel.org/r/20171120204033.GA9446@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: use %u for pid printing and slightly less stackAlexey Dobriyan2018-02-061-8/+8
| | | | | | | | | | | | | PROC_NUMBUF is 13 which is enough for "negative int + \n + \0". However PIDs and TGIDs are never negative and newline is not a concern, so use just 10 per integer. Link: http://lkml.kernel.org/r/20171120203005.GA27743@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexander Viro <viro@ftp.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: show si_ptr in /proc/<pid>/timers without hashingLinus Torvalds2017-12-061-1/+1
| | | | | | | | | | | | | | It's a user pointer, and while the permissions of the file are pretty questionable (should it really be readable to everybody), hashing the pointer isn't going to be the solution. We should take a closer look at more of the /proc/<pid> file permissions in general. Sure, we do want many of them to often be readable (for 'ps' and friends), but I think we should probably do a few conversions from S_IRUGO to S_IRUSR. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: don't report kernel addresses in /proc/<pid>/stackLinus Torvalds2017-11-271-2/+1
| | | | | | | | | | | | | | This just changes the file to report them as zero, although maybe even that could be removed. I checked, and at least procps doesn't actually seem to parse the 'stack' file at all. And since the file doesn't necessarily even exist (it requires CONFIG_STACKTRACE), possibly other tools don't really use it either. That said, in case somebody parses it with tools, just having that zero there should keep such tools happy. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'work.whack-a-mole' of ↵Linus Torvalds2017-11-221-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mode_t whack-a-mole from Al Viro: "For all internal uses we want umode_t, which is arch-independent; mode_t (or __kernel_mode_t, for that matter) is wrong outside of userland ABI. Unfortunately, that crap keeps coming back and needs to be put down from time to time..." * 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mode_t whack-a-mole: task_dump_owner()
| * mode_t whack-a-mole: task_dump_owner()Al Viro2017-09-301-1/+1
| | | | | | | | | | | | should be umode_t... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
OpenPOWER on IntegriCloud