<feed xmlns='http://www.w3.org/2005/Atom'>
<title>blackbird-op-linux/net/netfilter, branch master</title>
<subtitle>Blackbird™ Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/blackbird-op-linux/atom?h=master</id>
<link rel='self' href='https://git.raptorcs.com/git/blackbird-op-linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/'/>
<updated>2020-02-04T13:32:20+00:00</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2020-02-04T13:32:20+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-04T13:32:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=33b40134e5cfbbccad7f3040d1919889537a3df7'/>
<id>urn:sha1:33b40134e5cfbbccad7f3040d1919889537a3df7</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Use after free in rxrpc_put_local(), from David Howells.

 2) Fix 64-bit division error in mlxsw, from Nathan Chancellor.

 3) Make sure we clear various bits of TCP state in response to
    tcp_disconnect(). From Eric Dumazet.

 4) Fix netlink attribute policy in cls_rsvp, from Eric Dumazet.

 5) txtimer must be deleted in stmmac suspend(), from Nicolin Chen.

 6) Fix TC queue mapping in bnxt_en driver, from Michael Chan.

 7) Various netdevsim fixes from Taehee Yoo (use of uninitialized data,
    snapshot panics, stack out of bounds, etc.)

 8) cls_tcindex changes hash table size after allocating the table, fix
    from Cong Wang.

 9) Fix regression in the enforcement of session ID uniqueness in l2tp.
    We only have to enforce uniqueness for IP based tunnels not UDP
    ones. From Ridge Kennedy.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  gtp: use __GFP_NOWARN to avoid memalloc warning
  l2tp: Allow duplicate session creation with UDP
  r8152: Add MAC passthrough support to new device
  net_sched: fix an OOB access in cls_tcindex
  qed: Remove set but not used variable 'p_link'
  tc-testing: add missing 'nsPlugin' to basic.json
  tc-testing: fix eBPF tests failure on linux fresh clones
  net: hsr: fix possible NULL deref in hsr_handle_frame()
  netdevsim: remove unused sdev code
  netdevsim: use __GFP_NOWARN to avoid memalloc warning
  netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs
  netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()
  netdevsim: fix panic in nsim_dev_take_snapshot_write()
  netdevsim: disable devlink reload when resources are being used
  netdevsim: fix using uninitialized resources
  bnxt_en: Fix TC queue mapping.
  bnxt_en: Fix logic that disables Bus Master during firmware reset.
  bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset.
  bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.
  net: stmmac: Delete txtimer in suspend()
  ...
</content>
</entry>
<entry>
<title>proc: convert everything to "struct proc_ops"</title>
<updated>2020-02-04T03:05:26+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-02-04T01:37:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=97a32539b9568bb653683349e5a76d02ff3c3e2c'/>
<id>urn:sha1:97a32539b9568bb653683349e5a76d02ff3c3e2c</id>
<content type='text'>
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

	llseek		=&gt; proc_lseek
	unlocked_ioctl	=&gt; proc_ioctl

	xxx		=&gt; proc_xxx

	delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
  Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>netfilter: flowtable: Fix setting forgotten NF_FLOW_HW_DEAD flag</title>
<updated>2020-01-31T18:31:42+00:00</updated>
<author>
<name>Paul Blakey</name>
<email>paulb@mellanox.com</email>
</author>
<published>2020-01-30T16:04:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=c22208b7ce3ef0c2c184ff0d9f6423614b1799d9'/>
<id>urn:sha1:c22208b7ce3ef0c2c184ff0d9f6423614b1799d9</id>
<content type='text'>
During the refactor this was accidently removed.

Fixes: ae29045018c8 ("netfilter: flowtable: add nf_flow_offload_tuple() helper")
Signed-off-by: Paul Blakey &lt;paulb@mellanox.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: flowtable: Fix missing flush hardware on table free</title>
<updated>2020-01-31T18:31:41+00:00</updated>
<author>
<name>Paul Blakey</name>
<email>paulb@mellanox.com</email>
</author>
<published>2020-01-30T16:04:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=0f34f30a1be80f3f59efeaab596396bc698e7337'/>
<id>urn:sha1:0f34f30a1be80f3f59efeaab596396bc698e7337</id>
<content type='text'>
If entries exist when freeing a hardware offload enabled table,
we queue work for hardware while running the gc iteration.

Execute it (flush) after queueing.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey &lt;paulb@mellanox.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: flowtable: Fix hardware flush order on nf_flow_table_cleanup</title>
<updated>2020-01-31T18:31:40+00:00</updated>
<author>
<name>Paul Blakey</name>
<email>paulb@mellanox.com</email>
</author>
<published>2020-01-30T16:04:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=91bfaa15a379e9af24f71fb4ee08d8019b6e8ec7'/>
<id>urn:sha1:91bfaa15a379e9af24f71fb4ee08d8019b6e8ec7</id>
<content type='text'>
On netdev down event, nf_flow_table_cleanup() is called for the relevant
device and it cleans all the tables that are on that device.
If one of those tables has hardware offload flag,
nf_flow_table_iterate_cleanup flushes hardware and then runs the gc.
But the gc can queue more hardware work, which will take time to execute.

Instead first add the work, then flush it, to execute it now.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey &lt;paulb@mellanox.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: Use kvcalloc</title>
<updated>2020-01-31T18:30:54+00:00</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2020-01-28T19:07:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=b9e0102a57d768bdb99cbbfa01225f73d58e03bc'/>
<id>urn:sha1:b9e0102a57d768bdb99cbbfa01225f73d58e03bc</id>
<content type='text'>
Convert the uses of kvmalloc_array with __GFP_ZERO to
the equivalent kvcalloc.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: ipset: fix suspicious RCU usage in find_set_and_id</title>
<updated>2020-01-29T17:34:46+00:00</updated>
<author>
<name>Kadlecsik József</name>
<email>kadlec@blackhole.kfki.hu</email>
</author>
<published>2020-01-25T19:39:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=5038517119d50ed0240059b1d7fc2faa92371c08'/>
<id>urn:sha1:5038517119d50ed0240059b1d7fc2faa92371c08</id>
<content type='text'>
find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held.
However, in the error path there can be a follow-up recvmsg() without
the mutex held. Use the start() function of struct netlink_dump_control
instead of dump() to verify and report if the specified set does not
exist.

Thanks to Pablo Neira Ayuso for helping me to understand the subleties
of the netlink protocol.

Reported-by: syzbot+fc69d7cb21258ab4ae4d@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik &lt;kadlec@netfilter.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>nf_tables: Add set type for arbitrary concatenation of ranges</title>
<updated>2020-01-27T07:54:30+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2020-01-21T23:17:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=3c4287f62044a90e73a561aa05fc46e62da173da'/>
<id>urn:sha1:3c4287f62044a90e73a561aa05fc46e62da173da</id>
<content type='text'>
This new set type allows for intervals in concatenated fields,
which are expressed in the usual way, that is, simple byte
concatenation with padding to 32 bits for single fields, and
given as ranges by specifying start and end elements containing,
each, the full concatenation of start and end values for the
single fields.

Ranges are expanded to composing netmasks, for each field: these
are inserted as rules in per-field lookup tables. Bits to be
classified are divided in 4-bit groups, and for each group, the
lookup table contains 4^2 buckets, representing all the possible
values of a bit group. This approach was inspired by the Grouper
algorithm:
	http://www.cse.usf.edu/~ligatti/projects/grouper/

Matching is performed by a sequence of AND operations between
bucket values, with buckets selected according to the value of
packet bits, for each group. The result of this sequence tells
us which rules matched for a given field.

In order to concatenate several ranged fields, per-field rules
are mapped using mapping arrays, one per field, that specify
which rules should be considered while matching the next field.
The mapping array for the last field contains a reference to
the element originally inserted.

The notes in nft_set_pipapo.c cover the algorithm in deeper
detail.

A pure hash-based approach is of no use here, as ranges need
to be classified. An implementation based on "proxying" the
existing red-black tree set type, creating a tree for each
field, was considered, but deemed impractical due to the fact
that elements would need to be shared between trees, at least
as long as we want to keep UAPI changes to a minimum.

A stand-alone implementation of this algorithm is available at:
	https://pipapo.lameexcu.se
together with notes about possible future optimisations
(in pipapo.c).

This algorithm was designed with data locality in mind, and can
be highly optimised for SIMD instruction sets, as the bulk of
the matching work is done with repetitive, simple bitwise
operations.

At this point, without further optimisations, nft_concat_range.sh
reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
L2$):

TEST: performance
  net,port                                                      [ OK ]
    baseline (drop from netdev hook):              10190076pps
    baseline hash (non-ranged entries):             6179564pps
    baseline rbtree (match on first field only):    2950341pps
    set with  1000 full, ranged entries:            2304165pps
  port,net                                                      [ OK ]
    baseline (drop from netdev hook):              10143615pps
    baseline hash (non-ranged entries):             6135776pps
    baseline rbtree (match on first field only):    4311934pps
    set with   100 full, ranged entries:            4131471pps
  net6,port                                                     [ OK ]
    baseline (drop from netdev hook):               9730404pps
    baseline hash (non-ranged entries):             4809557pps
    baseline rbtree (match on first field only):    1501699pps
    set with  1000 full, ranged entries:            1092557pps
  port,proto                                                    [ OK ]
    baseline (drop from netdev hook):              10812426pps
    baseline hash (non-ranged entries):             6929353pps
    baseline rbtree (match on first field only):    3027105pps
    set with 30000 full, ranged entries:             284147pps
  net6,port,mac                                                 [ OK ]
    baseline (drop from netdev hook):               9660114pps
    baseline hash (non-ranged entries):             3778877pps
    baseline rbtree (match on first field only):    3179379pps
    set with    10 full, ranged entries:            2082880pps
  net6,port,mac,proto                                           [ OK ]
    baseline (drop from netdev hook):               9718324pps
    baseline hash (non-ranged entries):             3799021pps
    baseline rbtree (match on first field only):    1506689pps
    set with  1000 full, ranged entries:             783810pps
  net,mac                                                       [ OK ]
    baseline (drop from netdev hook):              10190029pps
    baseline hash (non-ranged entries):             5172218pps
    baseline rbtree (match on first field only):    2946863pps
    set with  1000 full, ranged entries:            1279122pps

v4:
 - fix build for 32-bit architectures: 64-bit division needs
   div_u64() (kbuild test robot &lt;lkp@intel.com&gt;)
v3:
 - rework interface for field length specification,
   NFT_SET_SUBKEY disappears and information is stored in
   description
 - remove scratch area to store closing element of ranges,
   as elements now come with an actual attribute to specify
   the upper range limit (Pablo Neira Ayuso)
 - also remove pointer to 'start' element from mapping table,
   closing key is now accessible via extension data
 - use bytes right away instead of bits for field lengths,
   this way we can also double the inner loop of the lookup
   function to take care of upper and lower bits in a single
   iteration (minor performance improvement)
 - make it clearer that set operations are actually atomic
   API-wise, but we can't e.g. implement flush() as one-shot
   action
 - fix type for 'dup' in nft_pipapo_insert(), check for
   duplicates only in the next generation, and in general take
   care of differentiating generation mask cases depending on
   the operation (Pablo Neira Ayuso)
 - report C implementation matching rate in commit message, so
   that AVX2 implementation can be compared (Pablo Neira Ayuso)
v2:
 - protect access to scratch maps in nft_pipapo_lookup() with
   local_bh_disable/enable() (Florian Westphal)
 - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
   already implied (Florian Westphal)
 - explain why partial allocation failures don't need handling
   in pipapo_realloc_scratch(), rename 'm' to clone and update
   related kerneldoc to make it clear we're not operating on
   the live copy (Florian Westphal)
 - add expicit check for priv-&gt;start_elem in
   nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
   with a NULL start element, and also zero it out in every
   operation that might make it invalid, so that insertion
   doesn't proceed with an invalid element (Florian Westphal)

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Support for sets with multiple ranged fields</title>
<updated>2020-01-27T07:54:30+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2020-01-21T23:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=f3a2181e16f1dcbf5446ed43f6b5d9f56c459f85'/>
<id>urn:sha1:f3a2181e16f1dcbf5446ed43f6b5d9f56c459f85</id>
<content type='text'>
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc-&gt;field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

  NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
  NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		4
  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute</title>
<updated>2020-01-27T07:54:30+00:00</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2020-01-21T23:17:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=7b225d0b5c6dda5fefab578175f210c6fc7e389a'/>
<id>urn:sha1:7b225d0b5c6dda5fefab578175f210c6fc7e389a</id>
<content type='text'>
Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
  nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
