packet: packet fanout rollover during socket overload

Changes: v3->v2: rebase (no other changes) passes selftest v2->v1: read f->num_members only once fix bug: test rollover mode + flag Minimize packet drop in a fanout group. If one socket is full, roll over packets to another from the group. Maintain flow affinity during normal load using an rxhash fanout policy, while dispersing unexpected traffic storms that hit a single cpu, such as spoofed-source DoS flows. Rollover breaks affinity for flows arriving at saturated sockets during those conditions. The patch adds a fanout policy ROLLOVER that rotates between sockets, filling each socket before moving to the next. It also adds a fanout flag ROLLOVER. If passed along with any other fanout policy, the primary policy is applied until the chosen socket is full. Then, rollover selects another socket, to delay packet drop until the entire system is saturated. Probing sockets is not free. Selecting the last used socket, as rollover does, is a greedy approach that maximizes chance of success, at the cost of extreme load imbalance. In practice, with sufficiently long queues to absorb bursts, sockets are drained in parallel and load balance looks uniform in `top`. To avoid contention, scales counters with number of sockets and accesses them lockfree. Values are bounds checked to ensure correctness. Tested using an application with 9 threads pinned to CPUs, one socket per thread and sufficient busywork per packet operation to limits each thread to handling 32 Kpps. When sent 500 Kpps single UDP stream packets, a FANOUT_CPU setup processes 32 Kpps in total without this patch, 270 Kpps with the patch. Tested with read() and with a packet ring (V1). Also, passes psock_fanout.c unit test added to selftests. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Willem de Bruijn <willemb@google.com> 2013-03-19 10:18:11 +0000
committer: David S. Miller <davem@davemloft.net> 2013-03-19 17:15:04 -0400
commit: 77f65ebdca506870d99bfabe52bde222511022ec (patch)
tree: 8f5ba6c76d1b49b44128d08281cc0b6f3e62285c /net/packet/internal.h
parent: b0aa73bf081da6810dacd750b9f8186640e172db (diff)
download: talos-op-linux-77f65ebdca506870d99bfabe52bde222511022ec.tar.gz
talos-op-linux-77f65ebdca506870d99bfabe52bde222511022ec.zip
1 files changed, 2 insertions, 1 deletions
diff --git a/net/packet/internal.h b/net/packet/internal.h
index e84cab8cb7a9..e891f025a1b9 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -77,10 +77,11 @@ struct packet_fanout {
 	unsigned int		num_members;
 	u16			id;
 	u8			type;
-	u8			defrag;
+	u8			flags;
 	atomic_t		rr_cur;
 	struct list_head	list;
 	struct sock		*arr[PACKET_FANOUT_MAX];
+	int			next[PACKET_FANOUT_MAX];
 	spinlock_t		lock;
 	atomic_t		sk_ref;
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
author	Willem de Bruijn <willemb@google.com>	2013-03-19 10:18:11 +0000
committer	David S. Miller <davem@davemloft.net>	2013-03-19 17:15:04 -0400
commit	77f65ebdca506870d99bfabe52bde222511022ec (patch)
tree	8f5ba6c76d1b49b44128d08281cc0b6f3e62285c /net/packet/internal.h
parent	b0aa73bf081da6810dacd750b9f8186640e172db (diff)
download	talos-op-linux-77f65ebdca506870d99bfabe52bde222511022ec.tar.gz talos-op-linux-77f65ebdca506870d99bfabe52bde222511022ec.zip