<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/kernel/ptrace.c, branch v4.4.17</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.4.17</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.4.17'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2016-02-25T20:01:16+00:00</updated>
<entry>
<title>ptrace: use fsuid, fsgid, effective creds for fs access checks</title>
<updated>2016-02-25T20:01:16+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jann@thejh.net</email>
</author>
<published>2016-01-20T23:00:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=969624b7c1c8c9784651eb97431e6f2bbb7a024c'/>
<id>urn:sha1:969624b7c1c8c9784651eb97431e6f2bbb7a024c</id>
<content type='text'>
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -&gt; /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn &lt;jann@thejh.net&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: "Serge E. Hallyn" &lt;serge.hallyn@ubuntu.com&gt;
Cc: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>seccomp, ptrace: add support for dumping seccomp filters</title>
<updated>2015-10-28T02:55:13+00:00</updated>
<author>
<name>Tycho Andersen</name>
<email>tycho.andersen@canonical.com</email>
</author>
<published>2015-10-27T00:23:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=f8e529ed941ba2bbcbf310b575d968159ce7e895'/>
<id>urn:sha1:f8e529ed941ba2bbcbf310b575d968159ce7e895</id>
<content type='text'>
This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
    * check that the orig_prog exists (not necessary right now, but when
       grows eBPF support it will be)
    * s/n/filter_off and make it an unsigned long to match ptrace
    * count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
    * use a 0x42** constant for the ptrace command value

v4: * don't copy to userspace while holding spinlocks

v5: * add another condition to WARN_ON

v6: * rebase on net-next

Signed-off-by: Tycho Andersen &lt;tycho.andersen@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
CC: Will Drewry &lt;wad@chromium.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
CC: Andy Lutomirski &lt;luto@amacapital.net&gt;
CC: Pavel Emelyanov &lt;xemul@parallels.com&gt;
CC: Serge E. Hallyn &lt;serge.hallyn@ubuntu.com&gt;
CC: Alexei Starovoitov &lt;ast@kernel.org&gt;
CC: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>seccomp: add ptrace options for suspend/resume</title>
<updated>2015-07-15T18:52:52+00:00</updated>
<author>
<name>Tycho Andersen</name>
<email>tycho.andersen@canonical.com</email>
</author>
<published>2015-06-13T15:02:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=13c4a90119d28cfcb6b5bdd820c233b86c2b0237'/>
<id>urn:sha1:13c4a90119d28cfcb6b5bdd820c233b86c2b0237</id>
<content type='text'>
This patch is the first step in enabling checkpoint/restore of processes
with seccomp enabled.

One of the things CRIU does while dumping tasks is inject code into them
via ptrace to collect information that is only available to the process
itself. However, if we are in a seccomp mode where these processes are
prohibited from making these syscalls, then what CRIU does kills the task.

This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
filters to disable (and re-enable) seccomp filters for another task so that
they can be successfully dumped (and restored). We restrict the set of
processes that can disable seccomp through ptrace because although today
ptrace can be used to bypass seccomp, there is some discussion of closing
this loophole in the future and we would like this patch to not depend on
that behavior and be future proofed for when it is removed.

Note that seccomp can be suspended before any filters are actually
installed; this behavior is useful on criu restore, so that we can suspend
seccomp, restore the filters, unmap our restore code from the restored
process' address space, and then resume the task by detaching and have the
filters resumed as well.

v2 changes:

* require that the tracer have no seccomp filters installed
* drop TIF_NOTSC manipulation from the patch
* change from ptrace command to a ptrace option and use this ptrace option
  as the flag to check. This means that as soon as the tracer
  detaches/dies, seccomp is re-enabled and as a corrollary that one can not
  disable seccomp across PTRACE_ATTACHs.

v3 changes:

* get rid of various #ifdefs everywhere
* report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
  used

v4 changes:

* get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
  directly

v5 changes:

* check that seccomp is not enabled (or suspended) on the tracer

Signed-off-by: Tycho Andersen &lt;tycho.andersen@canonical.com&gt;
CC: Will Drewry &lt;wad@chromium.org&gt;
CC: Roland McGrath &lt;roland@hack.frob.com&gt;
CC: Pavel Emelyanov &lt;xemul@parallels.com&gt;
CC: Serge E. Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
[kees: access seccomp.mode through seccomp_mode() instead]
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
</entry>
<entry>
<title>ptrace: ptrace_detach() can no longer race with SIGKILL</title>
<updated>2015-04-17T13:04:06+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-04-16T19:47:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=64a4096c5cdab377b6e1f44008ee8b2636db579d'/>
<id>urn:sha1:64a4096c5cdab377b6e1f44008ee8b2636db579d</id>
<content type='text'>
ptrace_detach() re-checks -&gt;ptrace under tasklist lock and calls
release_task() if __ptrace_detach() returns true.  This was needed because
the __TASK_TRACED tracee could be killed/untraced, and it could even pass
exit_notify() before we take tasklist_lock.

But this is no longer possible after 9899d11f6544 "ptrace: ensure
arch_ptrace/ptrace_request can never race with SIGKILL".  We can turn
these checks into WARN_ON() and remove release_task().

While at it, document the setting of child-&gt;exit_code.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pavel Labath &lt;labath@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ptrace: fix race between ptrace_resume() and wait_task_stopped()</title>
<updated>2015-04-17T13:04:06+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-04-16T19:47:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed'/>
<id>urn:sha1:b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed</id>
<content type='text'>
ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
tracee-&gt;exit_code and then wake_up_state() changes tracee-&gt;state.  If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace =&gt; T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears -&gt;exit_code
the tracee can miss a signal.

Test-case:

	#include &lt;stdio.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/wait.h&gt;
	#include &lt;sys/ptrace.h&gt;
	#include &lt;pthread.h&gt;
	#include &lt;assert.h&gt;

	int pid;

	void *waiter(void *arg)
	{
		int stat;

		for (;;) {
			assert(pid == wait(&amp;stat));
			assert(WIFSTOPPED(stat));
			if (WSTOPSIG(stat) == SIGHUP)
				continue;

			assert(WSTOPSIG(stat) == SIGCONT);
			printf("ERR! extra/wrong report:%x\n", stat);
		}
	}

	int main(void)
	{
		pthread_t thread;

		pid = fork();
		if (!pid) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			for (;;)
				kill(getpid(), SIGHUP);
		}

		assert(pthread_create(&amp;thread, NULL, waiter, NULL) == 0);

		for (;;)
			ptrace(PTRACE_CONT, pid, 0, SIGCONT);

		return 0;
	}

Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Pavel Labath &lt;labath@google.com&gt;
Tested-by: Pavel Labath &lt;labath@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ptrace: remove linux/compat.h inclusion under CONFIG_COMPAT</title>
<updated>2015-02-17T22:34:51+00:00</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2015-02-17T21:45:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=1cca3385e6d556cd90cdc148c2f26af807fa3600'/>
<id>urn:sha1:1cca3385e6d556cd90cdc148c2f26af807fa3600</id>
<content type='text'>
Commit 84c751bd4aeb ("ptrace: add ability to retrieve signals without
removing from a queue (v4)") includes &lt;linux/compat.h&gt; globally in
ptrace.c

This patch removes inclusion under if defined CONFIG_COMPAT.

Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()</title>
<updated>2014-12-11T01:41:10+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-12-10T23:45:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=7c8bd2322c7fd973d089b27de55e29c92c667a06'/>
<id>urn:sha1:7c8bd2322c7fd973d089b27de55e29c92c667a06</id>
<content type='text'>
Now that forget_original_parent() uses -&gt;ptrace_entry for EXIT_DEAD tasks,
we can simply pass "dead_children" list to exit_ptrace() and remove
another release_task() loop.  Plus this way we do not need to drop and
reacquire tasklist_lock.

Also shift the list_empty(ptraced) check, if we want this optimization it
makes sense to eliminate the function call altogether.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;,
Cc: Sterling Alexander &lt;stalexan@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Remove proliferation of wait_on_bit() action functions</title>
<updated>2014-07-16T13:10:39+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-07-07T05:16:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=743162013d40ca612b4cb53d3a200dff2d9ab26e'/>
<id>urn:sha1:743162013d40ca612b4cb53d3a200dff2d9ab26e</id>
<content type='text'>
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt; (fscache, keys)
Acked-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt; (gfs2)
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Steve French &lt;sfrench@samba.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel/compat: convert to COMPAT_SYSCALL_DEFINE</title>
<updated>2014-03-06T14:35:10+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2014-03-03T15:11:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=62a6fa97684ed4c124564ea92500ecd513d60611'/>
<id>urn:sha1:62a6fa97684ed4c124564ea92500ecd513d60611</id>
<content type='text'>
Convert all compat system call functions where all parameter types
have a size of four or less than four bytes, or are pointer types
to COMPAT_SYSCALL_DEFINE.
The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
zero and sign extension to 64 bit of all parameters if needed.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>exec/ptrace: fix get_dumpable() incorrect tests</title>
<updated>2013-11-13T03:09:33+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2013-11-12T23:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=d049f74f2dbe71354d43d393ac3a188947811348'/>
<id>urn:sha1:d049f74f2dbe71354d43d393ac3a188947811348</id>
<content type='text'>
The get_dumpable() return value is not boolean.  Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
protected state.  Almost all places did this correctly, excepting the two
places fixed in this patch.

Wrong logic:
    if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
        or
    if (dumpable == 0) { /* be protective */ }
        or
    if (!dumpable) { /* be protective */ }

Correct logic:
    if (dumpable != SUID_DUMP_USER) { /* be protective */ }
        or
    if (dumpable != 1) { /* be protective */ }

Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user.  (This may have been partially mitigated if Yama was enabled.)

The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.

CVE-2013-2929

Reported-by: Vasily Kulikov &lt;segoon@openwall.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
