summaryrefslogtreecommitdiffstats
path: root/net/ipv6/ip6_fib.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-05-121-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fib: fix fib dump restartKumar Sundararajan2014-04-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: remove parameter rt from fib6_prune_clones()Duan Jiong2014-05-121-7/+5
|/ | | | | | | | | the parameter rt will be assigned to c.arg in function fib6_clean_tree(), but function fib6_prune_clone() doesn't use c.arg, so we can remove it safely. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix checkpatch errors of "foo*" and "foo * bar"Wang Yufen2014-03-291-15/+15
| | | | | | | | | | ERROR: "(foo*)" should be "(foo *)" ERROR: "foo * bar" should be "foo *bar" Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix checkpatch errors of brace and trailing statementsWang Yufen2014-03-291-8/+10
| | | | | | | | | ERROR: open brace '{' following enum go on the same line ERROR: open brace '{' following struct go on the same line ERROR: trailing statements should be on next line Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix checkpatch errors comments and spaceWang Yufen2014-03-291-14/+12
| | | | | | | | | | | WARNING: please, no space before tabs WARNING: please, no spaces at the start of a line ERROR: spaces required around that ':' (ctx:VxW) ERROR: spaces required around that '>' (ctx:VxV) ERROR: spaces required around that '>=' (ctx:VxV) Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: do not overwrite inetpeer metrics prematurelyMichal Kubeček2014-03-271-3/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an IPv6 host route with metrics exists, an attempt to add a new route for the same target with different metrics fails but rewrites the metrics anyway: 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 rto_min lock 1s 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500 RTNETLINK answers: File exists 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 rto_min lock 1.5s This is caused by all IPv6 host routes using the metrics in their inetpeer (or the shared default). This also holds for the new route created in ip6_route_add() which shares the metrics with the already existing route and thus ip6_route_add() rewrites the metrics even if the new route ends up not being used at all. Another problem is that old metrics in inetpeer can reappear unexpectedly for a new route, e.g. 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000 12sp0:~ # ip route del fec0::1 12sp0:~ # ip route add fec0::1 dev eth0 12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s Resolve the first problem by moving the setting of metrics down into fib6_add_rt2node() to the point we are sure we are inserting the new route into the tree. Second problem is addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE which is set for a new host route in ip6_route_add() and makes ipv6_cow_metrics() always overwrite the metrics in inetpeer (even if they are not "new"); it is reset after that. v5: use a flag in _metrics member rather than one in flags v4: fix a typo making a condition always true (thanks to Hannes Frederic Sowa) v3: rewritten based on David Miller's idea to move setting the metrics (and allocation in non-host case) down to the point we already know the route is to be inserted. Also rebased to net-next as it is quite late in the cycle. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: remove prune parameter for fib6_clean_allLi RongQing2014-01-021-3/+3
| | | | | | | | since the prune parameter for fib6_clean_all always is 0, remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: compare sernum when walking fib for /proc/net/ipv6_route as safety netHannes Frederic Sowa2013-09-271-0/+14
| | | | | | | | | | | | | | | | | | This patch provides an additional safety net against NULL pointer dereferences while walking the fib trie for the new /proc/net/ipv6_route walkers. I never needed it myself and am unsure if it is needed at all, but the same checks where introduced in 2bec5a369ee79576a3eea2c23863325089785a2c ("ipv6: fib: fix crash when changing large fib while dumping it") to fix NULL pointer bugs. This patch is separated from the first patch to make it easier to revert if we are sure we can drop this logic. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: avoid high order memory allocations for /proc/net/ipv6_routeHannes Frederic Sowa2013-09-271-19/+172
| | | | | | | | | | | | | | | | | | | Dumping routes on a system with lots rt6_infos in the fibs causes up to 11-order allocations in seq_file (which fail). While we could switch there to vmalloc we could just implement the streaming interface for /proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from single_open_net to seq_open_net. loff_t *pos tracks dst entries. Also kill never used struct rt6_proc_arg and now unused function fib6_clean_all_ro. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fib: fib6_add: fix potential NULL pointer dereferenceDaniel Borkmann2013-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return with an error in fn = fib6_add_1(), then error codes are encoded into the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we write the error code into err and jump to out, hence enter the if(err) condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for: if (pn != fn && pn->leaf == rt) ... if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO)) ... Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn evaluates to true and causes a NULL-pointer dereference on further checks on pn. Fix it, by setting both NULL in error case, so that pn != fn already evaluates to false and no further dereference takes place. This was first correctly implemented in 4a287eba2 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag"), but the bug got later on introduced by 188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Lin Ming <mlin@ss.pku.edu.cn> Cc: Matti Vaittinen <matti.vaittinen@nsn.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-161-4/+12
|\
| * ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not matchHannes Frederic Sowa2013-08-071-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case a subtree did not match we currently stop backtracking and return NULL (root table from fib_lookup). This could yield in invalid routing table lookups when using subtrees. Instead continue to backtrack until a valid subtree or node is found and return this match. Also remove unneeded NULL check. Reported-by: Teco Boot <teco@inf-net.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Lamparter <equinox@diac24.net> Cc: <boutier@pps.univ-paris-diderot.fr> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-031-12/+13
|\ \ | |/ | | | | | | | | | | Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: update ip6_rt_last_gc every time GC is runMichal Kubeček2013-08-011-1/+5
| | | | | | | | | | | | | | | | | | | | As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: prevent fib6_run_gc() contentionMichal Kubeček2013-08-011-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a high-traffic router with many processors and many IPv6 dst entries, soft lockup in fib6_run_gc() can occur when number of entries reaches gc_thresh. This happens because fib6_run_gc() uses fib6_gc_lock to allow only one thread to run the garbage collector but ip6_dst_gc() doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc() returns. On a system with many entries, this can take some time so that in the meantime, other threads pass the tests in ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for the lock. They then have to run the garbage collector one after another which blocks them for quite long. Resolve this by replacing special value ~0UL of expire parameter to fib6_run_gc() by explicit "force" parameter to choose between spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with force=false if gc_thresh is reached but not max_size. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ipv6 eliminate parameter "int addrlen" in function fib6_add_1fan.du2013-07-241-8/+8
|/ | | | | | | | | | The "int addrlen" in fib6_add_1 is rebundant, as we can get it from parameter "struct in6_addr *addr" once we modified its type. And also fix some coding style issues in fib6_add_1 Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: only static routes qualify for equal cost multipathingHannes Frederic Sowa2013-07-121-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | Static routes in this case are non-expiring routes which did not get configured by autoconf or by icmpv6 redirects. To make sure we actually get an ecmp route while searching for the first one in this fib6_node's leafs, also make sure it matches the ecmp route assumptions. v2: a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF already ensures that this route, even if added again without RTF_EXPIRES (in case of a RA announcement with infinite timeout), does not cause the rt6i_nsiblings logic to go wrong if a later RA updates the expiration time later. v3: a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so, because an pmtu event could update the RTF_EXPIRES flag and we would not count this route, if another route joins this set. We now filter only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that don't get changed after rt6_info construction. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* hlist: drop the node parameter from iteratorsSasha Levin2013-02-271-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipv6: add support of equal cost multipath (ECMP)Nicolas Dichtel2012-10-231-0/+57
| | | | | | | | | | | | | | | | Each nexthop is added like a single route in the routing table. All routes that have the same metric/weight and destination but not the same gateway are considering as ECMP routes. They are linked together, through a list called rt6i_siblings. ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after the other (in both case, the flag NLM_F_EXCL should not be set). The patch is based on a previous work from Luc Saillard <luc.saillard@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: return errno pointers consistently for fib6_add_1()Lin Ming2012-09-281-14/+6
| | | | | | | | fib6_add_1() should consistently return errno pointers, rather than a mixture of NULL and errno pointers. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix return value check in fib6_add()Wei Yongjun2012-09-211-0/+4
| | | | | | | | | | | In case of error, the function fib6_add_1() returns ERR_PTR() or NULL pointer. The ERR_PTR() case check is missing in fib6_add(). dpatch engine is used to generated this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-06-251-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/usb/qmi_wwan.c net/batman-adv/translation-table.c net/ipv6/route.c qmi_wwan.c resolution provided by Bjørn Mork. batman-adv conflict is dealing merely with the changes of global function names to have a proper subsystem prefix. ipv6's route.c conflict is merely two side-by-side additions of network namespace methods. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fib: fix fib dump restartEric Dumazet2012-06-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2bec5a369ee79576a3 (ipv6: fib: fix crash when changing large fib while dumping it) introduced ability to restart the dump at tree root, but failed to skip correctly a count of already dumped entries. Code didn't match Patrick intent. We must skip exactly the number of already dumped entries. Note that like other /proc/net files or netlink producers, we could still dump some duplicates entries. Reported-by: Debabrata Banerjee <dbavatar@gmail.com> Reported-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-06-161-11/+7
|\ \ | |/ | | | | | | | | | | | | | | | | Conflicts: net/ipv6/route.c Pull in 'net' again to get the revert of Thomas's change which introduced regressions. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert "ipv6: Prevent access to uninitialized fib_table_hash via ↵David S. Miller2012-06-161-11/+7
| | | | | | | | | | | | | | | | | | | | | | /proc/net/ipv6_route" This reverts commit 2a0c451ade8e1783c5d453948289e4a978d417c9. It causes crashes, because now ip6_null_entry is used before it is initialized. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-06-151-7/+11
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | Conflicts: net/ipv6/route.c This deals with a merge conflict between the net-next addition of the inetpeer network namespace ops, and Thomas Graf's bug fix in 2a0c451ade8e1783c5d453948289e4a978d417c9 which makes sure we don't register /proc/net/ipv6_route before it is actually safe to do so. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_routeThomas Graf2012-06-151-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. fib6_init() as a whole can't be moved to an earlier position as it also registers the rtnetlink message handlers which should be registered at the end. Therefore split it into fib6_init() which is run early and fib6_init_late() to register the rtnetlink message handlers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-06-121-1/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fib: Restore NTF_ROUTER exception in fib6_age()Thomas Graf2012-06-071-1/+1
| | | | | | | | | | | | | | | | | | Commit 5339ab8b1dd82 (ipv6: fib: Convert fib6_age() to dst_neigh_lookup().) seems to have mistakenly inverted the exception for cached NTF_ROUTER routes. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Add inetpeer tree roots to the FIB tables.David S. Miller2012-06-111-0/+5
|/ | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debugJoe Perches2012-05-161-2/+3
| | | | | | | Use the current debugging style and enable dynamic_debug. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv6: Standardize prefixes for message loggingJoe Perches2012-05-161-10/+10
| | | | | | | | | | | | | | | | Add #define pr_fmt(fmt) as appropriate. Add "IPv6: " to appropriate files. Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG). Standardize on "%s: " not "%s(): " when emitting __func__. Use "%s: ", __func__ instead of embedding function name. Coalesce formats, align arguments. ADDRCONF output is now prefixed with "IPv6: " Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix problem with expired dst cacheGao feng2012-04-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet. this dst cache will not check expire because it has no RTF_EXPIRES flag. So this dst cache will always be used until the dst gc run. Change the struct dst_entry,add a union contains new pointer from and expires. When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use. we can use this field to point to where the dst cache copy from. The dst.from is only used in IPV6. rt6_check_expired check if rt6_info.dst.from is expired. ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. ip6_dst_destroy release the ort. Add some functions to operate the RTF_EXPIRES flag and expires(from) together. and change the code to use these new adding functions. Changes from v5: modify ip6_route_add and ndisc_router_discovery to use new adding functions. Only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fib: Convert fib6_age() to dst_neigh_lookup().David S. Miller2012-01-271-5/+14
| | | | | | | | In this specific situation we know we are dealing with a gatewayed route and therefore rt6i_gateway is not going to be in6addr_any even in future interpretations. Signed-off-by: David S. Miller <davem@davemloft.net>
* IPv6: Avoid taking write lock for /proc/net/ipv6_routeJosh Hunt2011-12-301-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During some debugging I needed to look into how /proc/net/ipv6_route operated and in my digging I found its calling fib6_clean_all() which uses "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I found this on 2.6.32, but reading the code I believe the same basic idea exists currently. Looking at the rtnetlink code they are only calling "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize reading from proc isn't the recommended way of fetching the ipv6 route table; taking a write lock seems unnecessary and would probably cause network performance issues. To verify this I loaded up the ipv6 route table and then ran iperf in 3 cases: * doing nothing * reading ipv6 route table via proc (while :; do cat /proc/net/ipv6_route > /dev/null; done) * reading ipv6 route table via rtnetlink (while :; do ip -6 route show table all > /dev/null; done) * Load the ipv6 route table up with: * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done * iperf commands: * client: iperf -i 1 -V -c <ipv6 addr> * server: iperf -V -s * iperf results - 3 runs each (in Mbits/sec) * nothing: client: 927,927,927 server: 927,927,927 * proc: client: 179,97,96,113 server: 142,112,133 * iproute: client: 928,927,928 server: 927,927,927 lock_stat shows taking the write lock is causing the slowdown. Using this info I decided to write a version of fib6_clean_all() which replaces write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With this new function I see the same results as with my rtnetlink iperf test. Signed-off-by: Josh Hunt <joshhunt00@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Kill rt6i_dev and rt6i_expires defines.David S. Miller2011-12-281-5/+5
| | | | | | | | | It just obscures that the netdevice pointer and the expires value are implemented in the dst_entry sub-object of the ipv6 route. And it makes grepping for dst_entry member uses much harder too. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller2011-12-051-1/+1
| | | | | | | | To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* ipv6: Various cleanups in ip6_route.cDavid S. Miller2011-12-031-59/+56
| | | | | | | | | | | | | | | 1) x == NULL --> !x 2) x != NULL --> x 3) if() --> if () 4) while() --> while () 5) (x & BIT) == 0 --> !(x & BIT) 6) (x&BIT) --> (x & BIT) 7) x=y --> x = y 8) (BIT1|BIT2) --> (BIT1 | BIT2) 9) if ((x & BIT)) --> if (x & BIT) 10) proper argument and struct member alignment Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Use pr_warn() in ip6_fib.cDavid S. Miller2011-11-171-10/+10
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* IPV6 Fix a crash when trying to replace non existing routeMatti Vaittinen2011-11-171-7/+14
| | | | | | | | | | | | | | This patch fixes a crash when non existing IPv6 route is tried to be changed. When new destination node was inserted in middle of FIB6 tree, no relevant sanity checks were performed. Later route insertion might have been prevented due to invalid request, causing node with no rt info being left in tree. When this node was accessed, a crash occurred. Patch adds missing checks in fib6_add_1() Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPv6: Removing unnecessary NULL checks.Matti Vaittinen2011-11-151-4/+3
| | | | | | | | | This patch removes unnecessary NULL checks noticed by Dan Carpenter. Checks were introduced in commit 4a287eba2de395713d8b2b2aeaa69fa086832d34 to net-next. Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn ↵Matti Vaittinen2011-11-141-15/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | about missing CREATE flag The support for NLM_F_* flags at IPv6 routing requests. If NLM_F_CREATE flag is not defined for RTM_NEWROUTE request, warning is printed, but no error is returned. Instead new route is added. Later NLM_F_CREATE may be required for new route creation. Exception is when NLM_F_REPLACE flag is given without NLM_F_CREATE, and no matching route is found. In this case it should be safe to assume that the request issuer is familiar with NLM_F_* flags, and does really not want route to be created. Specifying NLM_F_REPLACE flag will now make the kernel to search for matching route, and replace it with new one. If no route is found and NLM_F_CREATE is specified as well, then new route is created. Also, specifying NLM_F_EXCL will yield returning of error if matching route is found. Patch created against linux-3.2-rc1 Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cleanup: remove unnecessary include.Kevin Wilson2011-10-191-4/+0
| | | | | | | This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c. Signed-off-by: Kevin Wilson <wkevils@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix NULL dereferences in check_peer_redir()Eric Dumazet2011-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Abstract dst->neighbour accesses behind helpers.David S. Miller2011-07-171-1/+1
| | | | | | dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Get rid of rt6i_nexthop macro.David S. Miller2011-07-171-1/+1
| | | | | | | It just makes it harder to see 1) what the code is doing and 2) grep for all users of dst{->,.}neighbour Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2011-06-091-1/+2
| | | | | | | | | | | | | | | The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* net: dont hold rtnl mutex during netlink dump callbacksEric Dumazet2011-05-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Four years ago, Patrick made a change to hold rtnl mutex during netlink dump callbacks. I believe it was a wrong move. This slows down concurrent dumps, making good old /proc/net/ files faster than rtnetlink in some situations. This occurred to me because one "ip link show dev ..." was _very_ slow on a workload adding/removing network devices in background. All dump callbacks are able to use RCU locking now, so this patch does roughly a revert of commits : 1c2d670f366 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks 6313c1e0992 : [RTNETLINK]: Remove unnecessary locking in dump callbacks This let writers fight for rtnl mutex and readers going full speed. It also takes care of phonet : phonet_route_get() is now called from rcu read section. I renamed it to phonet_route_get_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: constify ip headers and in6_addrEric Dumazet2011-04-221-8/+8
| | | | | | | | Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud