diff options
author | Roman Gushchin <guro@fb.com> | 2018-02-02 15:26:57 +0000 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2018-02-02 19:49:31 -0500 |
commit | edbe69ef2c90fc86998a74b08319a01c508bd497 (patch) | |
tree | ae886133adf7f3518a47f81db0644fe74732b80b | |
parent | 4db428a7c9ab07e08783e0fcdc4ca0f555da0567 (diff) | |
download | talos-obmc-linux-edbe69ef2c90fc86998a74b08319a01c508bd497.tar.gz talos-obmc-linux-edbe69ef2c90fc86998a74b08319a01c508bd497.zip |
Revert "defer call to mem_cgroup_sk_alloc()"
This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()").
Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
memcg socket memory accounting, as packets received before memcg
pointer initialization are not accounted and are causing refcounting
underflow on socket release.
Actually the free-after-use problem was fixed by
commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
sk_clone_lock()") for the cgroup pointer.
So, let's revert it and call mem_cgroup_sk_alloc() just before
cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
we're cloning, and it holds a reference to the memcg.
Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
mem_cgroup_sk_alloc(). I see no reasons why bumping the root
memcg counter is a good reason to panic, and there are no realistic
ways to hit it.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r-- | mm/memcontrol.c | 14 | ||||
-rw-r--r-- | net/core/sock.c | 5 | ||||
-rw-r--r-- | net/ipv4/inet_connection_sock.c | 1 |
3 files changed, 15 insertions, 5 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0ae2dc3a1748..0937f2c52c7d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5747,6 +5747,20 @@ void mem_cgroup_sk_alloc(struct sock *sk) if (!mem_cgroup_sockets_enabled) return; + /* + * Socket cloning can throw us here with sk_memcg already + * filled. It won't however, necessarily happen from + * process context. So the test for root memcg given + * the current task's memcg won't help us in this case. + * + * Respecting the original socket's memcg is a better + * decision in this case. + */ + if (sk->sk_memcg) { + css_get(&sk->sk_memcg->css); + return; + } + rcu_read_lock(); memcg = mem_cgroup_from_task(current); if (memcg == root_mem_cgroup) diff --git a/net/core/sock.c b/net/core/sock.c index 1033f8ab0547..e50e7b3f2223 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1683,16 +1683,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) newsk->sk_dst_pending_confirm = 0; newsk->sk_wmem_queued = 0; newsk->sk_forward_alloc = 0; - - /* sk->sk_memcg will be populated at accept() time */ - newsk->sk_memcg = NULL; - atomic_set(&newsk->sk_drops, 0); newsk->sk_send_head = NULL; newsk->sk_userlocks = sk->sk_userlocks & ~SOCK_BINDPORT_LOCK; atomic_set(&newsk->sk_zckey, 0); sock_reset_flag(newsk, SOCK_DONE); + mem_cgroup_sk_alloc(newsk); cgroup_sk_alloc(&newsk->sk_cgrp_data); rcu_read_lock(); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 12410ec6f7f7..881ac6d046f2 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -475,7 +475,6 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) } spin_unlock_bh(&queue->fastopenq.lock); } - mem_cgroup_sk_alloc(newsk); out: release_sock(sk); if (req) |