summaryrefslogtreecommitdiffstats
path: root/net/ipv4/route.c
Commit message (Collapse)AuthorAgeFilesLines
* mib: add net to IP_INC_STATS_BHPavel Emelyanov2008-07-161-1/+2
| | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: remove flush_mutex from ipv4_sysctl_rtcache_flushDenis V. Lunev2008-07-081-8/+6
| | | | | | | | | | It is possible to avoid locking at all in ipv4_sysctl_rtcache_flush by defining local ctl_table on the stack. The patch is based on the suggestion from Eric W. Biederman. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: selective flush of rt_cacheDenis V. Lunev2008-07-051-1/+30
| | | | | | | | | dst cache is marked as expired on the per/namespace basis by previous path. Right now we have to implement selective cache shrinking. This procedure has been ported from older OpenVz codebase. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: place rt_genid into struct netDenis V. Lunev2008-07-051-33/+43
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: pass current value of rt_genid into rt_hashDenis V. Lunev2008-07-051-11/+17
| | | | | | | | Basically, there is no difference to atomic_read internally or pass it as a parameter as rt_hash is inline. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: add struct net parameter to rt_cache_invalidateDenis V. Lunev2008-07-051-3/+3
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: make rt_secret_rebuild timer per namespaceDenis V. Lunev2008-07-051-10/+30
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: register net.ipv4.route.flush in each namespaceDenis V. Lunev2008-07-051-10/+69
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: remove static flush_delay variableDenis V. Lunev2008-07-051-3/+8
| | | | | | | | | | | | | | | flush delay is used as an external storage for net.ipv4.route.flush sysctl entry. It is write-only. The ctl_table->data for this entry is used once. Fix this case to point to the stack to remove global variable. Do this to avoid additional variable on struct net in the next patch. Possible race (as it was before) accessing this local variable is removed using flush_mutex. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: add namespace parameter to rt_cache_flushDenis V. Lunev2008-07-051-4/+4
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove CVS keywordsAdrian Bunk2008-06-111-2/+0
| | | | | | | | This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* route: Mark unused route cache flags as such.Thomas Graf2008-06-031-1/+1
| | | | | | | Also removes an obsolete check for the unused flag RTCF_MASQ. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipsec: Use the correct ip_local_out functionHerbert Xu2008-05-201-1/+1
| | | | | | | | | | | | | | | | | | | Because the IPsec output function xfrm_output_resume does its own dst_output call it should always call __ip_local_output instead of ip_local_output as the latter may invoke dst_output directly. Otherwise the return values from nf_hook and dst_output may clash as they both use the value 1 but for different purposes. When that clash occurs this can cause a packet to be used after it has been freed which usually leads to a crash. Because the offending value is only returned from dst_output with qdiscs such as HTB, this bug is normally not visible. Thanks to Marco Berizzi for his perseverance in tracking this down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip: Use inline function dst_metric() instead of direct access to dst->metric[]Satoru SATOH2008-05-041-8/+8
| | | | | | | | | | | | There are functions to refer to the value of dst->metric[THE_METRIC-1] directly without use of a inline function "dst_metric" defined in net/dst.h. The following patch changes them to use the inline function consistently. Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip: Make use of the inline function dst_metric_locked()Satoru SATOH2008-05-041-1/+1
| | | | | Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()Timo Teras2008-04-291-16/+22
| | | | | | | | | Add struct net_device parameter to ip_rt_frag_needed() and update MTU to cache entries where ifindex is specified. This is similar to what is already done in ip_rt_redirect(). Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix heavy stack usage in seq_file output routines.Pavel Emelyanov2008-04-241-5/+6
| | | | | | | | | Plan C: we can follow the Al Viro's proposal about %n like in this patch. The same applies to udp, fib (the /proc/net/route file), rt_cache and sctp debug. This is minus ~150-200 bytes for each. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.YOSHIFUJI Hideaki2008-04-141-13/+13
| | | | | | | And use %u to format port. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPV4: use xor rather than multiple ands for route compareStephen Hemminger2008-04-101-5/+5
| | | | | | | | | | | The comparison in ip_route_input is a hot path, by recoding the C "and" as bit operations, fewer conditional branches get generated so the code should be faster. Maybe someday Gcc will be smart enough to do this? Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPV4: route rekey timer can be deferrableStephen Hemminger2008-04-101-1/+3
| | | | | | | No urgency on the rehash interval timer, so mark it as deferrable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPV4: route use jhash3Stephen Hemminger2008-04-101-6/+4
| | | | | | | | Since route hash is a triple, use jhash_3words rather doing the mixing directly. This should be as fast and give better distribution. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPV4: route inline changesStephen Hemminger2008-04-101-27/+27
| | | | | | | | Don't mark functions that are large as inline, let compiler decide. Also, use inline rather than __inline__. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] NETNS: Omit namespace comparision without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-261-5/+5
| | | | | | | | | | | Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-261-14/+15
| | | | | | | Without CONFIG_NET_NS, no namespace other than &init_net exists, no need to store net in seq_net_private. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-261-2/+2
| | | | | | | | | Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-261-14/+14
| | | | | | | | Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [IPV4] route: use read_mostlyStephen Hemminger2008-03-221-19/+17
| | | | | | | | | | | The route table parameters are set based on system memory and sysctl values that almost never change. Also the genid only changes every 10 minutes. RTprint is defined by never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: sk parameter is unused in ipv4_dst_blackhole.Denis V. Lunev2008-03-221-2/+2
| | | | | | | Just remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid castsEric Dumazet2008-03-051-11/+11
| | | | | | | | | | | | | | | | (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable *)skb->dst one. Defining an union like : union { struct dst_entry *dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Enable all routing manipulation via netlink inside namespace.Denis V. Lunev2008-02-281-8/+8
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Register /proc/net/rt_cache for each namespace.Denis V. Lunev2008-02-281-3/+21
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process /proc/net/rt_cache inside a namespace.Denis V. Lunev2008-02-281-3/+7
| | | | | | | Show routing cache for a particular namespace only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: rt_cache_get_next should take rt_genid into account.Denis V. Lunev2008-02-281-5/+13
| | | | | | | | | In the other case /proc/net/rt_cache will look inconsistent in respect to genid. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process ip_rt_redirect in the correct namespace.Denis V. Lunev2008-02-281-2/+5
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use proc_create() to setup ->proc_fops firstWang Chen2008-02-281-3/+2
| | | | | | | | Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to main tree. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: route: fix crash ip_route_inputPatrick McHardy2008-02-071-1/+1
| | | | | | | | | | | | | ip_route_me_harder() may call ip_route_input() with skbs that don't have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets generated by the REJECT target, resulting in a crash when dereferencing skb->dev->nd_net. Since ip_route_input() has an input device argument, it seems correct to use that one anyway. Bug introduced in b5921910a1 (Routing cache virtualization). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] route cache: Introduce rt_genid for smooth cache invalidationEric Dumazet2008-01-311-120/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | Current ip route cache implementation is not suited to large caches. We can consume a lot of CPU when cache must be invalidated, since we currently need to evict all cache entries, and this eviction is sometimes asynchronous. min_delay & max_delay can somewhat control this asynchronism behavior, but whole thing is a kludge, regularly triggering infamous soft lockup messages. When entries are still in use, this also consumes a lot of ram, filling dst_garbage.list. A better scheme is to use a generation identifier on each entry, so that cache invalidation can be performed by changing the table identifier, without having to scan all entries. No more delayed flushing, no more stalling when secret_interval expires. Invalidated entries will then be freed at GC time (controled by ip_rt_gc_timeout or stress), or when an invalidated entry is found in a chain when an insert is done. Thus we keep a normal equilibrium. This patch : - renames rt_hash_rnd to rt_genid (and makes it an atomic_t) - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit) - Checks entry->rt_genid at appropriate places :
* [NET]: should explicitely initialize atomic_t field in struct dst_opsEric Dumazet2008-01-311-0/+2
| | | | | | | | | | | All but one struct dst_ops static initializations miss explicit initialization of entries field. As this field is atomic_t, we should use ATOMIC_INIT(0), and not rely on atomic_t implementation. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Routing cache virtualization.Denis V. Lunev2008-01-281-5/+16
| | | | | | | | | | | | Basically, this piece looks relatively easy. Namespace is already available on the dst entry via device and the device is safe to dereferrence. Compare it with one of a searcher and skip entry if appropriate. The only exception is ip_rt_frag_needed. So, add namespace parameter to it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev2008-01-281-3/+3
| | | | | | | Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_flow.Denis V. Lunev2008-01-281-3/+4
| | | | | | | Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to __ip_route_output_key.Denis V. Lunev2008-01-281-3/+4
| | | | | | | | This is only required to propagate it down to the ip_route_output_slow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_slow.Denis V. Lunev2008-01-281-10/+11
| | | | | | | | This function needs a net namespace to lookup devices, fib tables, etc. in, so pass it there. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_dev_find.Denis V. Lunev2008-01-281-3/+3
| | | | | | | | in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add netns parameter to fib_select_default.Denis V. Lunev2008-01-281-1/+1
| | | | | | | | | Currently fib_select_default calls fib_get_table() with the init_net. Prepare it to provide a correct namespace to lookup default route. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Pass correct namespace in ip_rt_get_source.Denis V. Lunev2008-01-281-1/+1
| | | | | | | | | ip_rt_get_source is the infamous place for which dst_ifdown kludges have been implemented. This means that rt->u.dst.dev can be safely dereferrenced obtain nd_net. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Pass correct namespace in ip_route_input_slow.Denis V. Lunev2008-01-281-3/+4
| | | | | | | | The packet on the input path always has a referrence to an input network device it is passed from. Extract network namespace from it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add netns parameter to fib_lookup.Denis V. Lunev2008-01-281-3/+3
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Enable use of 240/4 address space.Jan Engelhardt2008-01-281-6/+6
| | | | | | | | | | This short patch modifies the IPv4 networking to enable use of the 240.0.0.0/4 (aka "class-E") address space as propsed in the internet draft draft-fuller-240space-00.txt. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][DST] dst: pass the dst_ops as parameter to the gc functionsDaniel Lezcano2008-01-281-3/+3
| | | | | | | | | | | | | | The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud