summaryrefslogtreecommitdiffstats
path: root/net/ipv4/route.c
Commit message (Collapse)AuthorAgeFilesLines
...
* ipv4: Fix time difference calculation in rt_bind_exception().David S. Miller2012-07-191-1/+1
| | | | | Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: fix rcu splatEric Dumazet2012-07-171-6/+7
| | | | | | | | | | free_nh_exceptions() should use rcu_dereference_protected(..., 1) since its called after one RCU grace period. Also add some const-ification in recent code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Fix nexthop exception hash computation.David S. Miller2012-07-171-4/+12
| | | | | | Need to mask it with (FNHE_HASH_SIZE - 1). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Add FIB nexthop exceptions.David S. Miller2012-07-171-31/+225
| | | | | | | | | | | | | | In a regime where we have subnetted route entries, we need a way to store persistent storage about destination specific learned values such as redirects and PMTU values. This is implemented here via nexthop exceptions. The initial implementation is a 2048 entry hash table with relaiming starting at chain length 5. A more sophisticated scheme can be devised if that proves necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()David S. Miller2012-07-171-8/+13
| | | | | | | | | | | | | | | | This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Don't store a rule pointer in fib_result.David S. Miller2012-07-131-4/+2
| | | | | | | | | | We only use it to fetch the rule's tclassid, so just store the tclassid there instead. This also decreases the size of fib_result by a full 8 bytes on 64-bit. On 32-bits it's a wash. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Fix warnings in ip_do_redirect() for some configurations.David S. Miller2012-07-121-4/+6
| | | | | Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add dummy dst_ops->redirect method where needed.David S. Miller2012-07-121-0/+5
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill ip_rt_redirect().David S. Miller2012-07-111-44/+0
| | | | | | | No longer needed, as the protocol handlers now all properly propagate the redirect back into the routing code. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions.David S. Miller2012-07-111-0/+28
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Generalize ip_do_redirect() and hook into new dst_ops->redirect.David S. Miller2012-07-111-40/+54
| | | | | | All of the redirect acceptance policy is now contained within. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Rearrange arguments to ip_rt_redirect()David S. Miller2012-07-111-4/+19
| | | | | | | | Pass in the SKB rather than just the IP addresses, so that policy and other aspects can reside in ip_rt_redirect() rather then icmp_redirect(). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Pull redirect instantiation out into a helper function.David S. Miller2012-07-111-15/+22
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove inetpeer from routes.David S. Miller2012-07-101-54/+6
| | | | | | No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Calling ->cow_metrics() now is a bug.David S. Miller2012-07-101-28/+2
| | | | | | | | | | | | | | | Nothing every writes to ipv4 metrics any longer. PMTU is stored in rt->rt_pmtu. Dynamic TCP metrics are stored in a special TCP metrics cache, completely outside of the routes. Therefore ->cow_metrics() can simply nothing more than a WARN_ON trigger so we can catch anyone who tries to add new writes to ipv4 route metrics. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill dst_copy_metrics() call from ipv4_blackhole_route().David S. Miller2012-07-101-1/+0
| | | | | | | | Blackhole routes have a COW metrics operation that returns NULL always, therefore this dst_copy_metrics() call did absolutely nothing. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Enforce max MTU metric at route insertion time.David S. Miller2012-07-101-6/+1
| | | | | | Rather than at every struct rtable creation. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Maintain redirect and PMTU info in struct rtable again.David S. Miller2012-07-101-146/+39
| | | | | | | Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo().David S. Miller2012-07-101-2/+1
| | | | | | Nobody provides non-zero values any longer. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Kill FLOWI_FLAG_PRECOW_METRICS.David S. Miller2012-07-101-9/+2
| | | | | | | | No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Minimize use of cached route inetpeer.David S. Miller2012-07-101-16/+16
| | | | | | | | | | | | Only use it in the absolutely required cases: 1) COW'ing metrics 2) ipv4 PMTU 3) ipv4 redirects Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Move timestamps from inetpeer to metrics cache.David S. Miller2012-07-101-6/+2
| | | | | | With help from Lin Ming. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Don't report route RTT metric value in cache dumps.David S. Miller2012-07-101-12/+10
| | | | | | | We don't maintain it dynamically any longer, so reporting it would be extremely misleading. Report zero instead. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: No need to set generic neighbour pointer.David S. Miller2012-07-051-59/+3
| | | | | | Nobody reads it any longer. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add optional SKB arg to dst_ops->neigh_lookup().David S. Miller2012-07-051-4/+10
| | | | | | | Causes the handler to use the daddr in the ipv4/ipv6 header when the route gateway is unspecified (local subnet). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Don't report neigh uptodate state in rtcache procfs.David S. Miller2012-07-051-10/+2
| | | | | | | | Soon routes will not have a cached neigh attached, nor will we be able to necessarily go directly to a neigh from an arbitrary route. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Make neigh lookups directly in output packet path.David S. Miller2012-07-051-5/+1
| | | | | | Do not use the dst cached neigh, we'll be getting rid of that. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove extraneous assignment of dst->tclassid.David S. Miller2012-06-281-3/+0
| | | | | | We already set it several lines above. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Adjust in_dev handling in fib_validate_source()David S. Miller2012-06-281-4/+6
| | | | | | | | | Checking for in_dev being NULL is pointless. In fact, all of our callers have in_dev precomputed already, so just pass it in and remove the NULL checking. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill rt->rt_spec_dst, no longer used.David S. Miller2012-06-281-29/+9
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "ipv4: tcp: dont cache unconfirmed intput dst"David S. Miller2012-06-271-5/+3
| | | | | | | | | | | | | | This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: tcp: dont cache unconfirmed intput dstEric Dumazet2012-06-271-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Cache ip_error() routes even when not forwarding.David S. Miller2012-06-261-11/+19
| | | | | | | | | | | | And account for the fact that, when we are not forwarding, we should bump statistic counters rather than emit an ICMP response. RP-filter rejected lookups are still not cached. Since -EHOSTUNREACH and -ENETUNREACH can now no longer be seen in ip_rcv_finish(), remove those checks. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove unnecessary code from rt_check_expire().David S. Miller2012-06-261-23/+11
| | | | | | | | IPv4 routing cache entries no longer use dst->expires, because the metrics, PMTU, and redirect information are stored in the inetpeer cache. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: tcp: dont cache output dst for syncookiesEric Dumazet2012-06-221-1/+4
| | | | | | | | | | Don't cache output dst for syncookies, as this adds pressure on IP route cache and rcu subsystem for no gain. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Cap ADVMSS metric in the FIB rather than the routing cache.David S. Miller2012-06-171-2/+0
| | | | | | | | | | It makes no sense to execute this limit test every time we create a routing cache entry. We can't simply error out on these things since we've silently accepted and truncated them forever. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Handle PMTU in all ICMP error handlers.David S. Miller2012-06-141-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With ip_rt_frag_needed() removed, we have to explicitly update PMTU information in every ICMP error handler. Create two helper functions to facilitate this. 1) ipv4_sk_update_pmtu() This updates the PMTU when we have a socket context to work with. 2) ipv4_update_pmtu() Raw version, used when no socket context is available. For this interface, we essentially just pass in explicit arguments for the flow identity information we would have extracted from the socket. And you'll notice that ipv4_sk_update_pmtu() is simply implemented in terms of ipv4_update_pmtu() Note that __ip_route_output_key() is used, rather than something like ip_route_output_flow() or ip_route_output_key(). This is because we absolutely do not want to end up with a route that does IPSEC encapsulation and the like. Instead, we only want the route that would get us to the node described by the outermost IP header. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Add interface option to enable routing of 127.0.0.0/8Thomas Graf2012-06-121-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Avoid potential NULL peer dereference.David S. Miller2012-06-111-5/+6
| | | | | | | | | | | We handle NULL in rt{,6}_set_peer but then our caller will try to pass that NULL pointer into inet_putpeer() which isn't ready for it. Fix this by moving the NULL check one level up, and then remove the now unnecessary NULL check from inetpeer_ptr_set_peer(). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Use FIB table peer roots in routes.David S. Miller2012-06-111-2/+6
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Add family scope inetpeer flushes.David S. Miller2012-06-111-1/+1
| | | | | | | | | | | | | | This implementation can deal with having many inetpeer roots, which is a necessary prerequisite for per-FIB table rooted peer tables. Each family (AF_INET, AF_INET6) has a sequence number which we bump when we get a family invalidation request. Each peer lookup cheaply checks whether the flush sequence of the root we are using is out of date, and if so flushes it and updates the sequence number. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill ip_rt_frag_needed().David S. Miller2012-06-111-61/+0
| | | | | | | | | | | | | | | | | | | | | | There is zero point to this function. It's only real substance is to perform an extremely outdated BSD4.2 ICMP check, which we can safely remove. If you really have a MTU limited link being routed by a BSD4.2 derived system, here's a nickel go buy yourself a real router. The other actions of ip_rt_frag_needed(), checking and conditionally updating the peer, are done by the per-protocol handlers of the ICMP event. TCP, UDP, et al. have a handler which will receive this event and transmit it back into the associated route via dst_ops->update_pmtu(). This simplification is important, because it eliminates the one place where we do not have a proper route context in which to make an inetpeer lookup. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Hide route peer accesses behind helpers.David S. Miller2012-06-111-23/+33
| | | | | | | | | | | | We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Pass inetpeer root into inet_getpeer*() interfaces.David S. Miller2012-06-091-3/+3
| | | | | | | Otherwise we reference potentially non-existing members when ipv6 is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Consolidate inetpeer_invalidate_tree() interfaces.David S. Miller2012-06-091-2/+2
| | | | | | | We only need one interface for this operation, since we always know which inetpeer root we want to flush. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.cDavid S. Miller2012-06-091-0/+25
| | | | | | Instead of net/ipv4/inetpeer.c Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Create and use rt{,6}_get_peer_create().David S. Miller2012-06-081-26/+9
| | | | | | | | | | | There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
* inetpeer: add parameter net for inet_getpeer_v4,v6Gao feng2012-06-081-3/+5
| | | | | | | | | | add struct net as a parameter of inet_getpeer_v[4,6], use net to replace &init_net. and modify some places to provide net for inet_getpeer_v[4,6] Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inetpeer: add namespace support for inetpeerGao feng2012-06-081-1/+1
| | | | | | | | | | | | | | | | now inetpeer doesn't support namespace,the information will be leaking across namespace. this patch move the global vars v4_peers and v6_peers to netns_ipv4 and netns_ipv6 as a field peers. add struct pernet_operations inetpeer_ops to initial pernet inetpeer data. and change family_to_base and inet_getpeer to support namespace. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mm: add a low limit to alloc_large_system_hashTim Bird2012-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | UDP stack needs a minimum hash size value for proper operation and also uses alloc_large_system_hash() for proper NUMA distribution of its hash tables and automatic sizing depending on available system memory. On some low memory situations, udp_table_init() must ignore the alloc_large_system_hash() result and reallocs a bigger memory area. As we cannot easily free old hash table, we leak it and kmemleak can issue a warning. This patch adds a low limit parameter to alloc_large_system_hash() to solve this problem. We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table allocation. Reported-by: Mark Asselstine <mark.asselstine@windriver.com> Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud