summaryrefslogtreecommitdiffstats
path: root/net/xfrm
Commit message (Collapse)AuthorAgeFilesLines
* xfrm: Fix crash observed during device unregistration and decryptionsubashab@codeaurora.org2016-03-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A crash is observed when a decrypted packet is processed in receive path. get_rps_cpus() tries to dereference the skb->dev fields but it appears that the device is freed from the poison pattern. [<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0 [<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc [<ffffffc000af6094>] netif_rx+0x74/0x94 [<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0 [<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c [<ffffffc000ba6eb8>] esp_input_done+0x20/0x30 [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc [<ffffffc0000b7324>] worker_thread+0x2f8/0x418 [<ffffffc0000bb40c>] kthread+0xe0/0xec -013|get_rps_cpu( | dev = 0xFFFFFFC08B688000, | skb = 0xFFFFFFC0C76AAC00 -> ( | dev = 0xFFFFFFC08B688000 -> ( | name = "...................................................... | name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev = 0xAAAAAAAAAAA Following are the sequence of events observed - - Encrypted packet in receive path from netdevice is queued - Encrypted packet queued for decryption (asynchronous) - Netdevice brought down and freed - Packet is decrypted and returned through callback in esp_input_done - Packet is queued again for process in network stack using netif_rx Since the device appears to have been freed, the dereference of skb->dev in get_rps_cpus() leads to an unhandled page fault exception. Fix this by holding on to device reference when queueing packets asynchronously and releasing the reference on call back return. v2: Make the change generic to xfrm as mentioned by Steffen and update the title to xfrm Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/xfrm_user: use in_compat_syscall to deny compat syscallsAndy Lutomirski2016-03-221-1/+1
| | | | | | | | | | | | The code wants to prevent compat code from receiving messages. Use in_compat_syscall for this. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipsec: Use skcipher and ahash when probing algorithmsHerbert Xu2016-01-271-3/+4
| | | | | | | | | | | This patch removes the last reference to hash and ablkcipher from IPsec and replaces them with ahash and skcipher respectively. For skcipher there is currently no difference at all, while for ahash the current code is actually buggy and would prevent asynchronous algorithms from being discovered. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
* net: preserve IP control block during GSO segmentationKonstantin Khlebnikov2016-01-151-0/+2
| | | | | | | | | | | | | | Skb_gso_segment() uses skb control block during segmentation. This patch adds 32-bytes room for previous control block which will be copied into all resulting segments. This patch fixes kernel crash during fragmenting forwarded packets. Fragmentation requires valid IP CB in skb for clearing ip options. Also patch removes custom save/restore in ovs code, now it's redundant. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2015-12-221-38/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-12-22 Just one patch to fix dst_entries_init with multiple namespaces. From Dan Streetman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm: dst_entries_init() per-net dst_opsDan Streetman2015-11-031-38/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | xfrm: add rcu protection to sk->sk_policy[]Eric Dumazet2015-12-111-12/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | XFRM can deal with SYNACK messages, sent while listener socket is not locked. We add proper rcu protection to __xfrm_sk_clone_policy() and xfrm_sk_policy_lookup() This might serve as the first step to remove xfrm.xfrm_policy_lock use in fast path. Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: add rcu grace period in xfrm_policy_destroy()Eric Dumazet2015-12-111-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | We will soon switch sk->sk_policy[] to RCU protection, as SYNACK packets are sent while listener socket is not locked. This patch simply adds RCU grace period before struct xfrm_policy freeing, and the corresponding rcu_head in struct xfrm_policy. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: take care of request socketsEric Dumazet2015-12-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP SYNACK messages might now be attached to request sockets. XFRM needs to get back to a listener socket. Adds new helpers that might be used elsewhere : sk_to_full_sk() and sk_const_to_full_sk() Note: We also need to add RCU protection for xfrm lookups, now TCP/DCCP have lockless listener processing. This will be addressed in separate patches. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2015-10-302-2/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-10-30 1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xfrm: Increment statistic counter on inner mode errorSteffen Klassert2015-10-231-1/+3
| | | | | | | | | | | | | | | | | | | | | Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter to notify about dropped packets if we fail to fetch a inner mode. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | xfrm: Fix unaligned access to stats in copy_to_user_state()Sowmini Varadhan2015-10-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On sparc, deleting established SAs (e.g., by restarting ipsec) results in unaligned access messages via xfrm_del_sa -> km_state_notify -> xfrm_send_state_notify(). Even though struct xfrm_usersa_info is aligned on 8-byte boundaries, netlink attributes are fundamentally only 4 byte aligned, and this cannot be changed for nla_data() that is passed up to userspace. As a result, the put_unaligned() macro needs to be used to set up potentially unaligned fields such as the xfrm_stats in copy_to_user_state() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-10-241-1/+3
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xfrm: Fix state threshold configuration from userspaceMichael Rossberg2015-09-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry timer (XFRMA_ETIMER_THRESH) of a state without having to set other attributes like replay counter and byte lifetime. Changing these other values while traffic flows will break the state. Signed-off-by: Michael Rossberg <michael.rossberg@tu-ilmenau.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | dst: Pass net into dst->outputEric W. Biederman2015-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_outEric W. Biederman2015-10-081-1/+1
| | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | dst: Pass a sk into .local_outEric W. Biederman2015-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Pass net into dst_output and remove dst_output_okfnEric W. Biederman2015-10-082-2/+2
| | | | | | | | | | | | | | | | | | | | | Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | xfrm: Only compute net once in xfrm_policy_queue_processEric W. Biederman2015-10-081-4/+3
| |/ |/| | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: constify ip_route_output_flow() socket argumentEric Dumazet2015-09-251-3/+3
| | | | | | | | | | | | | | | | | | | | Very soon, TCP stack might call inet_csk_route_req(), which calls inet_csk_route_req() with an unlocked listener socket, so we need to make sure ip_route_output_flow() is not trying to change any field from its socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: Add blank lines in callers of netfilter hooksEric W. Biederman2015-09-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | In code review it was noticed that I had failed to add some blank lines in places where they are customarily used. Taking a second look at the code I have to agree blank lines would be nice so I have added them here. Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: Pass net into okfnEric W. Biederman2015-09-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: Pass struct net into the netfilter hooksEric W. Biederman2015-09-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Merge dst_output and dst_output_skEric W. Biederman2015-09-172-2/+2
| | | | | | | | | | | | | | | | | | Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: Remove unused afinfo method init_dstEric W. Biederman2015-09-171-2/+0
|/ | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-09-032-15/+17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Another merge window, another set of networking changes. I've heard rumblings that the lightweight tunnels infrastructure has been voted networking change of the year. But what do I know? 1) Add conntrack support to openvswitch, from Joe Stringer. 2) Initial support for VRF (Virtual Routing and Forwarding), which allows the segmentation of routing paths without using multiple devices. There are some semantic kinks to work out still, but this is a reasonably strong foundation. From David Ahern. 3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov. 4) Ignore route nexthops with a link down state in ipv6, just like ipv4. From Andy Gospodarek. 5) Remove spinlock from fast path of act_gact and act_mirred, from Eric Dumazet. 6) Document the DSA layer, from Florian Fainelli. 7) Add netconsole support to bcmgenet, systemport, and DSA. Also from Florian Fainelli. 8) Add Mellanox Switch Driver and core infrastructure, from Jiri Pirko. 9) Add support for "light weight tunnels", which allow for encapsulation and decapsulation without bearing the overhead of a full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of others. 10) Add Identifier Locator Addressing support for ipv6, from Tom Herbert. 11) Support fragmented SKBs in iwlwifi, from Johannes Berg. 12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia. 13) Add BQL support to 3c59x driver, from Loganaden Velvindron. 14) Stop using a zero TX queue length to mean that a device shouldn't have a qdisc attached, use an explicit flag instead. From Phil Sutter. 15) Use generic geneve netdevice infrastructure in openvswitch, from Pravin B Shelar. 16) Add infrastructure to avoid re-forwarding a packet in software that was already forwarded by a hardware switch. From Scott Feldman. 17) Allow AF_PACKET fanout function to be implemented in a bpf program, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits) netfilter: nf_conntrack: make nf_ct_zone_dflt built-in netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled net: fec: clear receive interrupts before processing a packet ipv6: fix exthdrs offload registration in out_rt path xen-netback: add support for multicast control bgmac: Update fixed_phy_register() sock, diag: fix panic in sock_diag_put_filterinfo flow_dissector: Use 'const' where possible. flow_dissector: Fix function argument ordering dependency ixgbe: Resolve "initialized field overwritten" warnings ixgbe: Remove bimodal SR-IOV disabling ixgbe: Add support for reporting 2.5G link speed ixgbe: fix bounds checking in ixgbe_setup_tc for 82598 ixgbe: support for ethtool set_rxfh ixgbe: Avoid needless PHY access on copper phys ixgbe: cleanup to use cached mask value ixgbe: Remove second instance of lan_id variable ixgbe: use kzalloc for allocating one thing flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c ixgbe: Remove unused PCI bus types ...
| * xfrm: Add oif to dst lookupsDavid Ahern2015-08-111-10/+14
| | | | | | | | | | | | | | | | | | Rules can be installed that direct route lookups to specific tables based on oif. Plumb the oif through the xfrm lookups so it gets set in the flow struct and passed to the resolver routines. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * net/xfrm: use kmemdup rather than duplicating its implementationAndrzej Hajda2015-08-111-4/+2
| | | | | | | | | | | | | | | | | | | | The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Fix a typoJakub Wilk2015-07-211-1/+1
| | | | | | | | | | Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipsec: Replace seqniv with seqivHerbert Xu2015-08-171-7/+7
|/ | | | | | | Now that seqniv is identical with seqiv we no longer need it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-06-244-30/+40
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-06-013-2/+19
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ Merge branch 'master' of ↵David S. Miller2015-05-283-30/+28
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-05-28 1) Remove xfrm_queue_purge as this is the same as skb_queue_purge. 2) Optimize policy and state walk. 3) Use a sane return code if afinfo registration fails. 4) Only check fori a acquire state if the state is not valid. 5) Remove a unnecessary NULL check before xfrm_pol_hold as it checks the input for NULL. 6) Return directly if the xfrm hold queue is empty, avoid to take a lock as it is nothing to do in this case. 7) Optimize the inexact policy search and allow for matching of policies with priority ~0U. All from Li RongQing. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | xfrm: optimise to search the inexact policy listLi RongQing2015-05-181-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The policies are organized into list by priority ascent of policy, so it is unnecessary to continue to loop the policy if the priority of current looped police is larger than or equal priority which is from the policy_bydst list. This allows to match policy with ~0U priority in inexact list too. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | xfrm: move the checking for old xfrm_policy hold_queue to beginningLi RongQing2015-05-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if hold_queue of old xfrm_policy is NULL, return directly, then not need to run other codes, especially take the spin lock Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | xfrm: remove the unnecessary checking before call xfrm_pol_holdLi RongQing2015-05-051-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | xfrm_pol_hold will check its input with NULL Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | xfrm: slightly optimise xfrm_inputLi RongQing2015-04-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check x->km.state with XFRM_STATE_ACQ only when state is not XFRM_STAT_VALID, not everytime Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | xfrm: fix the return code when xfrm_*_register_afinfo failedLi RongQing2015-04-233-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo->family] had the value, return the -EEXIST, not -ENOBUFS Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | xfrm: optimise the use of walk list header in xfrm_policy/state_walkLi RongQing2015-04-232-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The walk from input is the list header, and marked as dead, and will be skipped in loop. list_first_entry() can be used to return the true usable value from walk if walk is not empty Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | xfrm: remove the xfrm_queue_purge definitionLi RongQing2015-04-231-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The task of xfrm_queue_purge is same as skb_queue_purge, so remove it Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | net: make skb_dst_pop routine staticYing Xue2015-05-121-0/+12
| |/ / | | | | | | | | | | | | | | | | | | | | | As xfrm_output_one() is the only caller of skb_dst_pop(), we should make skb_dst_pop() localized. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2015-06-222-9/+59
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull crypto update from Herbert Xu: "Here is the crypto update for 4.2: API: - Convert RNG interface to new style. - New AEAD interface with one SG list for AD and plain/cipher text. All external AEAD users have been converted. - New asymmetric key interface (akcipher). Algorithms: - Chacha20, Poly1305 and RFC7539 support. - New RSA implementation. - Jitter RNG. - DRBG is now seeded with both /dev/random and Jitter RNG. If kernel pool isn't ready then DRBG will be reseeded when it is. - DRBG is now the default crypto API RNG, replacing krng. - 842 compression (previously part of powerpc nx driver). Drivers: - Accelerated SHA-512 for arm64. - New Marvell CESA driver that supports DMA and more algorithms. - Updated powerpc nx 842 support. - Added support for SEC1 hardware to talitos" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits) crypto: marvell/cesa - remove COMPILE_TEST dependency crypto: algif_aead - Temporarily disable all AEAD algorithms crypto: af_alg - Forbid the use internal algorithms crypto: echainiv - Only hold RNG during initialisation crypto: seqiv - Add compatibility support without RNG crypto: eseqiv - Offer normal cipher functionality without RNG crypto: chainiv - Offer normal cipher functionality without RNG crypto: user - Add CRYPTO_MSG_DELRNG crypto: user - Move cryptouser.h to uapi crypto: rng - Do not free default RNG when it becomes unused crypto: skcipher - Allow givencrypt to be NULL crypto: sahara - propagate the error on clk_disable_unprepare() failure crypto: rsa - fix invalid select for AKCIPHER crypto: picoxcell - Update to the current clk API crypto: nx - Check for bogus firmware properties crypto: marvell/cesa - add DT bindings documentation crypto: marvell/cesa - add support for Kirkwood and Dove SoCs crypto: marvell/cesa - add support for Orion SoCs crypto: marvell/cesa - add allhwsupport module parameter crypto: marvell/cesa - add support for all armada SoCs ...
| * | xfrm: Define ChaCha20-Poly1305 AEAD XFRM algo for IPsec usersMartin Willi2015-06-041-0/+12
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | ipsec: Add IV generator information to xfrm_stateHerbert Xu2015-05-281-9/+31
| | | | | | | | | | | | | | | | | | | | | This patch adds IV generator information to xfrm_state. This is currently obtained from our own list of algorithm descriptions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | xfrm: Add IV generator information to xfrm_algo_descHerbert Xu2015-05-281-0/+16
| |/ | | | | | | | | | | | | | | This patch adds IV generator information for each AEAD and block cipher to xfrm_algo_desc. This will be used to access the new AEAD interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_inputAlexander Duyck2015-05-281-1/+16
| | | | | | | | | | | | | | | | | | This change makes it so that if a tunnel is defined we just use the mark from the tunnel instead of the mark from the skb header. By doing this we can avoid the need to set skb->mark inside of the tunnel receive functions. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | xfrm: Always zero high-order sequence number bitsHerbert Xu2015-05-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | As we're now always including the high bits of the sequence number in the IV generation process we need to ensure that they don't contain crap. This patch ensures that the high sequence bits are always zeroed so that we don't leak random data into the IV. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | xfrm: fix a race in xfrm_state_lookup_byspiLi RongQing2015-04-291-1/+1
|/ | | | | | | | | | | The returned xfrm_state should be hold before unlock xfrm_state_lock, otherwise the returned xfrm_state maybe be released. Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..] Cc: Fan Du <fan.du@intel.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Fan Du <fan.du@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-04-141-5/+5
|\ | | | | | | | | | | | | The dwmac-socfpga.c conflict was a case of a bug fix overlapping changes in net-next to handle an error pointer differently. Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm: fix xfrm_input/xfrm_tunnel_check oopsAlexey Dobriyan2015-04-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugzilla.kernel.org/show_bug.cgi?id=95211 Commit 70be6c91c86596ad2b60c73587880b47df170a41 ("xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer") added check which dereferences ->outer_mode too early but larval SAs don't have this pointer set (yet). So check for tunnel stuff later. Mike Noordermeer reported this bug and patiently applied all the debugging. Technically this is remote-oops-in-interrupt-context type of thing. BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0 ... [<ffffffff81500fc6>] ? xfrm4_esp_rcv+0x36/0x70 [<ffffffff814acc9a>] ? ip_local_deliver_finish+0x9a/0x200 [<ffffffff81471b83>] ? __netif_receive_skb_core+0x6f3/0x8f0 ... RIP [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0 Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
OpenPOWER on IntegriCloud