<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/net/netlink, branch v4.4.4</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.4.4</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.4.4'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2015-11-07T01:50:42+00:00</updated>
<entry>
<title>mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd</title>
<updated>2015-11-07T01:50:42+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2015-11-07T00:28:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=d0164adc89f6bb374d304ffcc375c6d2652fe67d'/>
<id>urn:sha1:d0164adc89f6bb374d304ffcc375c6d2652fe67d</id>
<content type='text'>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Vitaly Wool &lt;vitalywool@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-10-24T13:54:12+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-10-24T13:54:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=ba3e2084f268bdfed7627046e58a2218037e15af'/>
<id>urn:sha1:ba3e2084f268bdfed7627046e58a2218037e15af</id>
<content type='text'>
Conflicts:
	net/ipv6/xfrm6_output.c
	net/openvswitch/flow_netlink.c
	net/openvswitch/vport-gre.c
	net/openvswitch/vport-vxlan.c
	net/openvswitch/vport.c
	net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes.  One was
the egress tunnel info fix in 'net' and the other was the
vport -&gt;send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netlink: fix locking around NETLINK_LIST_MEMBERSHIPS</title>
<updated>2015-10-22T14:18:28+00:00</updated>
<author>
<name>David Herrmann</name>
<email>dh.herrmann@gmail.com</email>
</author>
<published>2015-10-21T09:47:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=47191d65b647af5eb5c82ede70ed4c24b1e93ef4'/>
<id>urn:sha1:47191d65b647af5eb5c82ede70ed4c24b1e93ef4</id>
<content type='text'>
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.

This can be easily reproduced by the following snippet:
    int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
    void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
    int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));

This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().

Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: David Herrmann &lt;dh.herrmann@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-10-20T13:08:27+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-10-20T13:08:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=26440c835f8b1a491e2704118ac55bf87334366c'/>
<id>urn:sha1:26440c835f8b1a491e2704118ac55bf87334366c</id>
<content type='text'>
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netlink: Trim skb to alloc size to avoid MSG_TRUNC</title>
<updated>2015-10-19T02:34:12+00:00</updated>
<author>
<name>Arad, Ronen</name>
<email>ronen.arad@intel.com</email>
</author>
<published>2015-10-15T08:55:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=db65a3aaf29ecce2e34271d52e8d2336b97bd9fe'/>
<id>urn:sha1:db65a3aaf29ecce2e34271d52e8d2336b97bd9fe</id>
<content type='text'>
netlink_dump() allocates skb based on the calculated min_dump_alloc or
a per socket max_recvmsg_len.
min_alloc_size is maximum space required for any single netdev
attributes as calculated by rtnl_calcit().
max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
It is capped at 16KiB.
The intention is to avoid small allocations and to minimize the number
of calls required to obtain dump information for all net devices.

netlink_dump packs as many small messages as could fit within an skb
that was sized for the largest single netdev information. The actual
space available within an skb is larger than what is requested. It could
be much larger and up to near 2x with align to next power of 2 approach.

Allowing netlink_dump to use all the space available within the
allocated skb increases the buffer size a user has to provide to avoid
truncaion (i.e. MSG_TRUNG flag set).

It was observed that with many VLANs configured on at least one netdev,
a larger buffer of near 64KiB was necessary to avoid "Message truncated"
error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
min_alloc_size was only little over 32KiB.

This patch trims skb to allocated size in order to allow the user to
avoid truncation with more reasonable buffer size.

Signed-off-by: Ronen Arad &lt;ronen.arad@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/netlink: lockdep_genl_is_held can be boolean</title>
<updated>2015-10-09T14:48:59+00:00</updated>
<author>
<name>Yaowei Bai</name>
<email>bywxiaobai@163.com</email>
</author>
<published>2015-10-08T13:28:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=61d03535e4be3a46c1e171a25458237e343195e3'/>
<id>urn:sha1:61d03535e4be3a46c1e171a25458237e343195e3</id>
<content type='text'>
This patch makes lockdep_genl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai &lt;bywxiaobai@163.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-09-26T23:08:27+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-09-26T23:08:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=4963ed48f2c20196d51a447ee87dc2815584fee4'/>
<id>urn:sha1:4963ed48f2c20196d51a447ee87dc2815584fee4</id>
<content type='text'>
Conflicts:
	net/ipv4/arp.c

The net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>genetlink: simplify genl_notify</title>
<updated>2015-09-24T19:25:23+00:00</updated>
<author>
<name>Jiri Benc</name>
<email>jbenc@redhat.com</email>
</author>
<published>2015-09-22T16:56:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=92c14d9b5ee86fd6cf136c01b6a87353522aebdd'/>
<id>urn:sha1:92c14d9b5ee86fd6cf136c01b6a87353522aebdd</id>
<content type='text'>
The genl_notify function has too many arguments for no real reason - all
callers use genl_info to get them anyway. Just pass the genl_info down to
genl_notify.

Signed-off-by: Jiri Benc &lt;jbenc@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netlink: Replace rhash_portid with bound</title>
<updated>2015-09-24T19:07:08+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2015-09-22T03:38:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=da314c9923fed553a007785a901fd395b7eb6c19'/>
<id>urn:sha1:da314c9923fed553a007785a901fd395b7eb6c19</id>
<content type='text'>
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
&gt;
&gt; store_release and load_acquire are different from the usual memory
&gt; barriers and can't be paired this way.  You have to pair store_release
&gt; and load_acquire.  Besides, it isn't a particularly good idea to

OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.

&gt; depend on memory barriers embedded in other data structures like the
&gt; above.  Here, especially, rhashtable_insert() would have write barrier
&gt; *before* the entry is hashed not necessarily *after*, which means that
&gt; in the above case, a socket which appears to have set bound to a
&gt; reader might not visible when the reader tries to look up the socket
&gt; on the hashtable.

But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.

&gt; There's no reason to be overly smart here.  This isn't a crazy hot
&gt; path, write barriers tend to be very cheap, store_release more so.
&gt; Please just do smp_store_release() and note what it's paired with.

It's not about being overly smart.  It's about actually understanding
what's going on with the code.  I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.

&gt; &gt; @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
&gt; &gt;  		}
&gt; &gt;  	}
&gt; &gt;
&gt; &gt; -	if (!nlk-&gt;portid) {
&gt; &gt; +	if (!nlk-&gt;bound) {
&gt;
&gt; I don't think you can skip load_acquire here just because this is the
&gt; second deref of the variable.  That doesn't change anything.  Race
&gt; condition could still happen between the first and second tests and
&gt; skipping the second would lead to the same kind of bug.

The reason this one is OK is because we do not use nlk-&gt;portid or
try to get nlk from the hash table before we return to user-space.

However, there is a real bug here that none of these acquire/release
helpers discovered.  The two bound tests here used to be a single
one.  Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket.  So we need to
repeat the portid check in order to maintain consistency.

&gt; &gt; @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
&gt; &gt;  	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
&gt; &gt;  		return -EPERM;
&gt; &gt;
&gt; &gt; -	if (!nlk-&gt;portid)
&gt; &gt; +	if (!nlk-&gt;bound)
&gt;
&gt; Don't we need load_acquire here too?  Is this path holding a lock
&gt; which makes that unnecessary?

Ditto.

---8&lt;---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.

Tejun is right that a barrier is unavoidable.  Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.

Barriers have been added where necessary to ensure that a valid
portid and the hashed socket is visible.

I have also changed netlink_insert to only return EBUSY if the
socket is bound to a portid different to the requested one.  This
combined with only reading nlk-&gt;bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.

Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Nacked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netlink: Fix autobind race condition that leads to zero port ID</title>
<updated>2015-09-21T05:55:31+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2015-09-18T11:16:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=1f770c0a09da855a2b51af6d19de97fb955eca85'/>
<id>urn:sha1:1f770c0a09da855a2b51af6d19de97fb955eca85</id>
<content type='text'>
The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
Reset portid after netlink_insert failure") introduced a race
condition where if two threads try to autobind the same socket
one of them may end up with a zero port ID.  This led to kernel
deadlocks that were observed by multiple people.

This patch reverts that commit and instead fixes it by introducing
a separte rhash_portid variable so that the real portid is only set
after the socket has been successfully hashed.

Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
