<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/net/core, branch v4.8</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.8</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=v4.8'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2016-09-19T22:36:17+00:00</updated>
<entry>
<title>cgroup: duplicate cgroup reference when cloning sockets</title>
<updated>2016-09-19T22:36:17+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>jweiner@fb.com</email>
</author>
<published>2016-09-19T21:44:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=d979a39d7242e0601bf9b60e89628fb8ac577179'/>
<id>urn:sha1:d979a39d7242e0601bf9b60e89628fb8ac577179</id>
<content type='text'>
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup.  As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Vladimir Davydov &lt;vdavydov@virtuozzo.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bonding: Fix bonding crash</title>
<updated>2016-09-04T18:41:12+00:00</updated>
<author>
<name>Mahesh Bandewar</name>
<email>maheshb@google.com</email>
</author>
<published>2016-09-02T05:18:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c'/>
<id>urn:sha1:24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c</id>
<content type='text'>
Following few steps will crash kernel -

  (a) Create bonding master
      &gt; modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      &gt; ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      &gt; echo +eth2 &gt; /sys/class/net/bond0/bonding/slaves
      &lt;crash&gt;

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly</title>
<updated>2016-09-02T05:45:03+00:00</updated>
<author>
<name>Gao Feng</name>
<email>fgao@ikuai8.com</email>
</author>
<published>2016-08-31T06:15:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=635c223cfa05af9523146b2f37e119d945f449ae'/>
<id>urn:sha1:635c223cfa05af9523146b2f37e119d945f449ae</id>
<content type='text'>
The original codes depend on that the function parameters are evaluated from
left to right. But the parameter's evaluation order is not defined in C
standard actually.

When flow_keys_have_l4(&amp;keys) is invoked before ___skb_get_hash(skb, &amp;keys,
hashrnd) with some compilers or environment, the keys passed to
flow_keys_have_l4 is not initialized.

Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash")

Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Gao Feng &lt;fgao@ikuai8.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: remove type_check from dev_get_nest_level()</title>
<updated>2016-08-13T22:15:54+00:00</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2016-08-12T14:10:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=952fcfd08c8109951622579d0ae7b9cd6cafd688'/>
<id>urn:sha1:952fcfd08c8109951622579d0ae7b9cd6cafd688</id>
<content type='text'>
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:

eth0 &lt;--- macvlan0 &lt;--- vlan0 &lt;--- macvlan1

However, this doesn't prevent a warning on a configuration such as:

eth0 &lt;--- macvlan0 &lt;--- vlan0
eth1 &lt;--- vlan1 &lt;--- macvlan1

In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:

- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
  take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
  take (macvlan_netdev_addr_lock_key, 1)

By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.

Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix write helpers with regards to non-linear parts</title>
<updated>2016-08-13T22:01:02+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-08-11T19:38:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=0ed661d5a48fa6df0b50ae64d27fe759a3ce42cf'/>
<id>urn:sha1:0ed661d5a48fa6df0b50ae64d27fe759a3ce42cf</id>
<content type='text'>
Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
it's currently defect with regards to skbs when the write_len spans into
non-linear parts, no matter if cloned or not.

There are multiple issues at once. First, using skb_store_bits() is not
correct since even if we have a cloned skb, page frags can still be shared.
To really make them private, we need to pull them in via __pskb_pull_tail()
first, which also gets us a private head via pskb_expand_head() implicitly.

This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
bpf_l4_csum_replace(). Really, the only thing reasonable and working here
is to call skb_ensure_writable() before any write operation. Meaning, via
pskb_may_pull() it makes sure that parts we want to access are pulled in and
if not does so plus unclones the skb implicitly. If our write_len still fits
the headlen and we're cloned and our header of the clone is not writable,
then we need to make a private copy via pskb_expand_head(). skb_store_bits()
is a bit misleading and only safe to store into non-linear data in different
contexts such as 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can
corrupt kernel memory").

For above BPF helper functions, it means after fixed bpf_try_make_writable(),
we've pulled in enough, so that we operate always based on skb-&gt;data. Thus,
the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
In bpf_skb_store_bytes(), the len check is unnecessary too since it can
only pass in maximum of BPF stack size, so adding offset is guaranteed to
never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
offset alignment since they use __sum16 pointer for writing resulting csum.

The remaining helpers that change skb data not discussed here yet are
bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
vlan helpers internally call either skb_ensure_writable() (pop case) and
skb_cow_head() (push case, for head expansion), respectively. Similarly,
bpf_skb_proto_xlat() takes care to not mangle page frags.

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Fixes: 3697649ff29e ("bpf: try harder on clones when writing into skb")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix bpf_skb_in_cgroup helper naming</title>
<updated>2016-08-13T04:53:33+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-08-12T20:17:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=747ea55e4f78fd980350c39570a986b8c1c3e4aa'/>
<id>urn:sha1:747ea55e4f78fd980350c39570a986b8c1c3e4aa</id>
<content type='text'>
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.

Tejun says:

  So, I think in_cgroup should mean that the object is in that
  particular cgroup while under_cgroup in the subhierarchy of that
  cgroup. Let's rename the other subhierarchy test to under too. I
  think that'd be a lot less confusing going forward.

  [...]

  It's more intuitive and gives us the room to implement the real
  "in" test if ever necessary in the future.

Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.

Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon &lt;sargun@sargun.me&gt;
Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix checksum for vlan push/pop helper</title>
<updated>2016-08-08T20:11:43+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-08-04T22:11:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=8065694e6519d234ccee93d952127e3f05b81ba5'/>
<id>urn:sha1:8065694e6519d234ccee93d952127e3f05b81ba5</id>
<content type='text'>
When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).

For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.

However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.

Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix checksum fixups on bpf_skb_store_bytes</title>
<updated>2016-08-08T20:11:43+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-08-04T22:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=479ffcccefd7b04442b0e949f8779b0612e8c75f'/>
<id>urn:sha1:479ffcccefd7b04442b0e949f8779b0612e8c75f</id>
<content type='text'>
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.

Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().

Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset &amp; 1
test away and generates the same code as with csum_sub()/_add().

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: also call skb_postpush_rcsum on xmit occasions</title>
<updated>2016-08-08T20:11:43+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-08-04T22:11:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=a2bfe6bf09a5f38df2bd0b1734f5fdee76f9366f'/>
<id>urn:sha1:a2bfe6bf09a5f38df2bd0b1734f5fdee76f9366f</id>
<content type='text'>
Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.

For the same reasons as described in f8ffad69c9f8 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.

Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.

Also here we have to address bpf_redirect() and bpf_clone_redirect().

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'salted-string-hash'</title>
<updated>2016-07-28T19:26:31+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-07-28T19:26:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=554828ee0db41618d101d9549db8808af9fd9d65'/>
<id>urn:sha1:554828ee0db41618d101d9549db8808af9fd9d65</id>
<content type='text'>
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
  fs/dcache.c: Save one 32-bit multiply in dcache lookup
  vfs: make the string hashes salt the hash
</content>
</entry>
</feed>
