summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tipc: Eliminate dynamic allocation of broadcast link data structuresAllan Stephens2011-12-273-31/+13
| | | | | | | | | | | | | | | | Creates global variables to hold the broadcast link's pseudo-bearer and pseudo-link structures, rather than allocating them dynamically. There is only a single instance of each structure, and changing over to static allocation allows elimination of code to handle the cases where dynamic allocation was unsuccessful. The memset in the teardown code may look like they aren't used, but the same teardown code is run when there is a non-fatal error at init-time, so that stale data isn't present when the user fixes the cause of the soft error. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Eliminate useless check when network address is assignedAllan Stephens2011-12-271-7/+5
| | | | | | | | | | Gets rid of an unnecessary check in the routine that updates the port id of a node's name publications when the node is assigned a network address, since the routine is only invoked if the new address is different from the existing one. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Minor correction to TIPC module unloadingAllan Stephens2011-12-271-1/+1
| | | | | | | | | | Modifies TIPC's module unloading logic to switch itself into "single node" mode before starting to terminate networking support. This helps to ensure that no operations that require TIPC to be in "networking" mode can initiate once unloading starts. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Eliminate useless memset operations in Ethernet media supportAllan Stephens2011-12-271-3/+0
| | | | | | | | | | | Gets rid of two pointless operations that zero out the array used to record information about TIPC's Ethernet bearers. There is no need to initialize the array on start up since it is a global variable that is already zero'd out, and there is no need to zero it out on exit because the array is never referenced again. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Do timely cleanup of disabled Ethernet bearer resourcesAllan Stephens2011-12-271-12/+27
| | | | | | | | | | | Modifies Ethernet bearer disable logic to break the association between the bearer and its device driver at the time the bearer is disabled, rather than when the TIPC module is unloaded. This allows the array entry used by the disabled bearer to be re-used if the same bearer (or a different one) is subsequently enabled. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Minor optimization to deactivation of Ethernet media suppotAllan Stephens2011-12-272-5/+1
| | | | | | | | | | | Change TIPC's shutdown code to deactivate generic networking support before terminating Ethernet media support. The deactivation of generic networking support causes all existing bearers to be destroyed, meaning the Ethernet media termination routine no longer has to bother marking them as unavailable. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Revise comment justifying release of configuration spinlockAllan Stephens2011-12-271-7/+6
| | | | | | | | | Comment-only change to better explain why TIPC's configuration lock is temporarily released while activating support for network interfaces, and why the existing activation code doesn't require rework. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Allow run-time alteration of default link settingsAllan Stephens2011-12-273-49/+127
| | | | | | | | | | | | | | | | | | | | | Permits run-time alteration of default link settings on a per-media and per-bearer basis, in addition to the existing per-link basis. The following syntax can now be used: tipc-config -lt=<link-name|bearer-name|media-name>/<tolerance> tipc-config -lp=<link-name|bearer-name|media-name>/<priority> tipc-config -lw=<link-name|bearer-name|media-name>/<window> Note that changes to the default settings for a given media type has no effect on the default settings used by existing bearers. Similarly, changes to default bearer settings has no effect on existing link endpoints that utilize that interface. Thanks to Florian Westphal <fw@strlen.de> for his contributions to the development of this enhancement. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Ignore neighbor discovery messages containing invalid addressAllan Stephens2011-12-271-0/+3
| | | | | | | | | | Adds a check to ensure that TIPC ignores an incoming neighbor discovery message that specifies an invalid media address as its source. The check ensures that the source address is a valid, non-broadcast address that could legally be used by a neighboring link endpoint. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Hide media-specific addressing details from generic bearer codeAllan Stephens2011-12-276-41/+41
| | | | | | | | | | | | | | | | Reworks TIPC's media address data structure and associated processing routines to transfer all media-specific details of address conversion to the associated TIPC media adaptation code. TIPC's generic bearer code now only needs to know which media type an address is associated with and whether or not it is a broadcast address, and totally ignores the "value" field that contains the actual media-specific addressing info. These changes eliminate the need for a number of endianness conversion operations and will make it easier for TIPC to support new media types in the future. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Add new address conversion routines for Ethernet mediaAllan Stephens2011-12-272-3/+79
| | | | | | | | | | | | | Enhances TIPC's Ethernet media support to provide 3 new address conversion routines, which allow TIPC to interpret an address that is in string form and to convert an address to and from the 20 byte format used in TIPC's neighbor discovery messages. These routines are pre-requisites to a follow on commit that hides all media-specific addressing details from TIPC's generic bearer code. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Improve handling of media address printing errorsAllan Stephens2011-12-273-15/+12
| | | | | | | | | | | Enhances conversion of a media address to printable form so that an unconvertable address will be displayed as a string of hex digits, rather than not being displayed at all. (Also removes a pointless check for the existence of the media-specific address conversion routine, since the routine is not optional.) Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Streamline media registration error checkingAllan Stephens2011-12-271-29/+8
| | | | | | | | | | | | | | | Simplifies error handling performed during media registration, since TIPC no longer supports the dynamic addition of new media types that are potentially error-prone. These simplifications include the following: 1) No longer check for premature registration of a new media type. 2) No longer check for negative link priority values (which was pointless since such values are unsigned, and could cause a compiler warning). 3) No longer generate a warning describing the exact cause of any registration failure (just warns that overall registration failed). Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Eliminate duplication of media structuresAllan Stephens2011-12-271-20/+15
| | | | | | | | | | Changes TIPC's list of registered media types from an array of media structures to an array of pointers to media structures. This eliminates the need to copy of the contents of the structure passed in during media registration. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Optimize detection of duplicate media registrationAllan Stephens2011-12-271-19/+21
| | | | | | | | | | | Streamlines the detection of an attempt to register a TIPC media structure using an already registered name or type identifier. The revised logic now reuses an existing routine to detect an existing name and no longer unnecessarily manipulates the media type counter during an unsuccessful registration attempt. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Register new media using pre-compiled structureAllan Stephens2011-12-273-66/+45
| | | | | | | | | Speeds up the registration of TIPC media types by passing in a structure containing the required information, rather than by passing in the various fields describing the media type individually. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* tipc: Enable use by containers having their own network namespaceAllan Stephens2011-12-271-3/+0
| | | | | | | | | | Permits a Linux container to use TIPC sockets even when it has its own network namespace defined by removing the check that prohibits such use. This makes it possible for users who wish to isolate their container network traffic from normal network traffic to utilize TIPC. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* bonding: document undocumented active_slave sysfs entry.Nicolas de Pesloüan2011-12-261-0/+17
| | | | | | | | | | | v2, based on Jay's review. I kept the 'link must be up' part, because this is enforced in the code. Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Kill useless route tracing bits in net/ipv6/route.cDavid S. Miller2011-12-261-20/+2
| | | | | | | | RDBG() wasn't even used, and the messages printed by RT6_DEBUG() were far from useful. Just get rid of all this stuff, we can replace it with something more suitable if we want. Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4: Add missing include of linux/slab.hAxel Lin2011-12-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include linux/slab.h to fix below build error: CC drivers/net/ethernet/mellanox/mlx4/resource_tracker.o drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_init_resource_tracker': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:233: error: implicit declaration of function 'kzalloc' drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:234: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_free_resource_tracker': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:264: error: implicit declaration of function 'kfree' drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_qp_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:370: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_mtt_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:386: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_mpt_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:402: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_eq_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:417: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_cq_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:431: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_srq_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:446: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_counter_tr': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:461: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'add_res_range': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:521: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mac_add_to_slave': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:1193: warning: assignment makes pointer from integer without a cast drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'add_mcg_res': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2521: warning: assignment makes pointer from integer without a cast make[5]: *** [drivers/net/ethernet/mellanox/mlx4/resource_tracker.o] Error 1 make[4]: *** [drivers/net/ethernet/mellanox/mlx4] Error 2 make[3]: *** [drivers/net/ethernet/mellanox] Error 2 make[2]: *** [drivers/net/ethernet] Error 2 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix: If we happen to find peer NULL when diag dumping, write zero.David S. Miller2011-12-261-2/+1
| | | | | | | Otherwise we leave uninitialized kernel memory in there. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Fix incoming connections nla lengthPavel Emelyanov2011-12-261-1/+2
| | | | | | | | The NLA_PUT macro should accept the actual attribute length, not the amount of elements in array :( Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'nf-next' of git://1984.lsi.us.es/net-nextDavid S. Miller2011-12-2559-378/+1168
|\
| * netfilter: xtables: add nfacct match to support extended accountingPablo Neira Ayuso2011-12-255-0/+101
| | | | | | | | | | | | | | | | | | | | This patch adds the match that allows to perform extended accounting. It requires the new nfnetlink_acct infrastructure. # iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic # iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: add extended accounting infrastructure over nfnetlinkPablo Neira Ayuso2011-12-256-1/+400
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently have two ways to account traffic in netfilter: - iptables chain and rule counters: # iptables -L -n -v Chain INPUT (policy DROP 3 packets, 867 bytes) pkts bytes target prot opt in out source destination 8 1104 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 - use flow-based accounting provided by ctnetlink: # conntrack -L tcp 6 431999 ESTABLISHED src=192.168.1.130 dst=212.106.219.168 sport=58152 dport=80 packets=47 bytes=7654 src=212.106.219.168 dst=192.168.1.130 sport=80 dport=58152 packets=49 bytes=66340 [ASSURED] mark=0 use=1 While trying to display real-time accounting statistics, we require to pool the kernel periodically to obtain this information. This is OK if the number of flows is relatively low. However, in case that the number of flows is huge, we can spend a considerable amount of cycles to iterate over the list of flows that have been obtained. Moreover, if we want to obtain the sum of the flow accounting results that match some criteria, we have to iterate over the whole list of existing flows, look for matchings and update the counters. This patch adds the extended accounting infrastructure for nfnetlink which aims to allow displaying real-time traffic accounting without the need of complicated and resource-consuming implementation in user-space. Basically, this new infrastructure allows you to create accounting objects. One accounting object is composed of packet and byte counters. In order to manipulate create accounting objects, you require the new libnetfilter_acct library. It contains several examples of use: libnetfilter_acct/examples# ./nfacct-add http-traffic libnetfilter_acct/examples# ./nfacct-get http-traffic = { pkts = 000000000000, bytes = 000000000000 }; Then, you can use one of this accounting objects in several iptables rules using the new nfacct match (which comes in a follow-up patch): # iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic # iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic The idea is simple: if one packet matches the rule, the nfacct match updates the counters. Thanks to Patrick McHardy, Eric Dumazet, Changli Gao for reviewing and providing feedback for this contribution. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: get and zero operations must be atomicPablo Neira Ayuso2011-12-241-45/+39
| | | | | | | | | | | | | | | | | | | | The get and zero operations have to be done in an atomic context, otherwise counters added between them will be lost. This problem was spotted by Changli Gao while discussing the nfacct infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: remove dead NAT codePatrick McHardy2011-12-239-25/+1
| | | | | | | | | | | | | | | | | | | | The NAT range to nlattr conversation callbacks and helpers are entirely dead code and are also useless since there are no NAT ranges in conntrack context, they are only used for initially selecting a tuple. The final NAT information is contained in the selected tuples of the conntrack entry. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_nat: remove obsolete check in nf_nat_mangle_udp_packet()Patrick McHardy2011-12-231-6/+0
| | | | | | | | | | | | | | | | | | | | The packet size check originates from a time when UDP helpers could accidentally mangle incorrect packets (NEWNAT) and is unnecessary nowadays since the conntrack helpers invoke the NAT helpers for the proper packet directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_nat: remove obsolete code from nf_nat_icmp_reply_translation()Patrick McHardy2011-12-231-13/+1
| | | | | | | | | | | | | | | | | | | | The inner tuple that is extracted from the packet is unused. The code also doesn't have any useful side-effects like verifying the packet does contain enough data to extract the inner tuple since conntrack already does the same, so remove it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nat: remove module reference counting from NAT protocolsPatrick McHardy2011-12-2310-32/+3
| | | | | | | | | | | | | | | | | | | | | | | | The only remaining user of NAT protocol module reference counting is NAT ctnetlink support. Since this is a fairly short sequence of code, convert over to use RCU and remove module reference counting. Module unregistration is already protected by RCU using synchronize_rcu(), so no further changes are necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_nat: add missing nla_policy entry for CTA_NAT_PROTO attributePatrick McHardy2011-12-231-0/+1
| | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_nat: use hash random for bysource hashPatrick McHardy2011-12-232-1/+2
| | | | | | | | | | | | | | | | Use nf_conntrack_hash_rnd in NAT bysource hash to avoid hash chain attacks. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_nat: export NAT definitions to userspacePatrick McHardy2011-12-2329-194/+185
| | | | | | | | | | | | | | | | | | | | | | Export the NAT definitions to userspace. So far userspace (specifically, iptables) has been copying the headers files from include/net. Also rename some structures and definitions in preparation for IPv6 NAT. Since these have never been officially exported, this doesn't affect existing userspace code. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: rework user-space expectation helper supportPablo Neira Ayuso2011-12-237-48/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This partially reworks bc01befdcf3e40979eb518085a075cbf0aacede0 which added userspace expectation support. This patch removes the nf_ct_userspace_expect_list since now we force to use the new iptables CT target feature to add the helper extension for conntracks that have attached expectations from userspace. A new version of the proof-of-concept code to implement userspace helpers from userspace is available at: http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2 This patch also modifies the CT target to allow to set the conntrack's userspace helper status flags. This flag is used to tell the conntrack system to explicitly allocate the helper extension. This helper extension is useful to link the userspace expectations with the master conntrack that is being tracked from one userspace helper. This feature fixes a problem in the current approach of the userspace helper support. Basically, if the master conntrack that has got a userspace expectation vanishes, the expectations point to one invalid memory address. Thus, triggering an oops in the expectation deletion event path. I decided not to add a new revision of the CT target because I only needed to add a new flag for it. I'll document in this issue in the iptables manpage. I have also changed the return value from EINVAL to EOPNOTSUPP if one flag not supported is specified. Thus, in the future adding new features that only require a new flag can be added without a new revision. There is no official code using this in userspace (apart from the proof-of-concept) that uses this infrastructure but there will be some by beginning 2012. Reported-by: Sam Roberts <vieuxtech@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: support individual atomic-get-and-reset of countersPablo Neira Ayuso2011-12-181-0/+11
| | | | | | | | | | | | | | This allows to use the get operation to atomically get-and-reset counters. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: use expect instead of master tuple in get operationPablo Neira Ayuso2011-12-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the expect tuple (if possible) instead of the master tuple for the get operation. If two or more expectations come from the same master, the returned expectation may not be the one that user-space is requesting. This is how it works for the expect deletion operation. Although I think that nobody has been seriously using this. We accept both possibilities, using the expect tuple if possible. I decided to do it like this to avoid breaking backward compatibility. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_conntrack: use atomic64 for accounting countersEric Dumazet2011-12-185-33/+33
| | | | | | | | | | | | | | | | | | | | | | We can use atomic64_t infrastructure to avoid taking a spinlock in fast path, and remove inaccuracies while reading values in ctnetlink_dump_counters() and connbytes_mt() on 32bit arches. Suggested by Pablo. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * IPVS: Modify the SH scheduler to use weightsMichael Maxim2011-12-132-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the algorithm to build the source hashing hash table to add extra slots for destinations with higher weight. This has the effect of allowing an IPVS SH user to give more connections to hosts that have been configured to have a higher weight. The reason for the Kconfig change is because the size of the hash table becomes more relevant/important if you decide to use the weights in the manner this patch lets you. It would be conceivable that someone might need to increase the size of that table to accommodate their configuration, so it will be handy to be able to do that through the regular configuration system instead of editing the source. Signed-off-by: Michael Maxim <mike@okcupid.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: add ipv6 reverse path filter matchFlorian Westphal2011-12-133-0/+144
| | | | | | | | | | | | | | | | | | This is not merged with the ipv4 match into xt_rpfilter.c to avoid ipv6 module dependency issues. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * ipv6: add ip6_route_lookupFlorian Westphal2011-12-042-0/+9
| | | | | | | | | | | | | | | | | | | | like rt6_lookup, but allows caller to pass in flowi6 structure. Will be used by the upcoming ipv6 netfilter reverse path filter match. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: add ipv4 reverse path filter matchFlorian Westphal2011-12-044-0/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This tries to do the same thing as fib_validate_source(), but differs in several aspects. The most important difference is that the reverse path filter built into fib_validate_source uses the oif as iif when performing the reverse lookup. We do not do this, as the oif is not yet known by the time the PREROUTING hook is invoked. We can't wait until FORWARD chain because by the time FORWARD is invoked ipv4 forward path may have already sent icmp messages is response to to-be-discarded-via-rpfilter packets. To avoid the such an additional lookup in PREROUTING, Patrick McHardy suggested to attach the path information directly in the match (i.e., just do what the standard ipv4 path does a bit earlier in PREROUTING). This works, but it also has a few caveats. Most importantly, when using marks in PREROUTING to re-route traffic based on the nfmark, -m rpfilter would have to be used after the nfmark has been set; otherwise the nfmark would have no effect (because the route is already attached). Another problem would be interaction with -j TPROXY, as this target sets an nfmark and uses ACCEPT instead of continue, i.e. such a version of -m rpfilter cannot be used for the initial to-be-intercepted packets. In case in turns out that the oif is required, we can add Patricks suggestion with a new match option (e.g. --rpf-use-oif) to keep ruleset compatibility. Another difference to current builtin ipv4 rpfilter is that packets subject to ipsec transformation are not automatically excluded. If you want this, simply combine -m rpfilter with the policy match. Packets arriving on loopback interfaces always match. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * net: ipv4: export fib_lookup and fib_table_lookupFlorian Westphal2011-12-042-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | The reverse path filter module will use fib_lookup. If CONFIG_IP_MULTIPLE_TABLES is not set, fib_lookup is only a static inline helper that calls fib_table_lookup, so export that too. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | rfs: better sizing of dev_flow_tableEric Dumazet2011-12-242-21/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches. Theorical limit on number of flows is 2^32 Fix some buggy RPS/RFS macros as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Xi Wang <xi.wang@gmail.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: Undo const marker in netlink_is_kernel().David S. Miller2011-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | We can't do this without propagating the const to nlk_sk() too, otherwise: net/netlink/af_netlink.c: In function ‘netlink_is_kernel’: net/netlink/af_netlink.c:103:2: warning: passing argument 1 of ‘nlk_sk’ discards ‘const’ qualifier from pointer target type [enabled by default] net/netlink/af_netlink.c:96:36: note: expected ‘struct sock *’ but argument is of type ‘const struct sock *’ Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-23424-3445/+6029
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller2011-12-231-3/+3
| |\ \
| | * | netfilter: xt_connbytes: handle negation correctlyFlorian Westphal2011-12-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "! --connbytes 23:42" should match if the packet/byte count is not in range. As there is no explict "invert match" toggle in the match structure, userspace swaps the from and to arguments (i.e., as if "--connbytes 42:23" were given). However, "what <= 23 && what >= 42" will always be false. Change things so we use "||" in case "from" is larger than "to". This change may look like it breaks backwards compatibility when "to" is 0. However, older iptables binaries will refuse "connbytes 42:0", and current releases treat it to mean "! --connbytes 0:42", so we should be fine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | net: relax rcvbuf limitsEric Dumazet2011-12-233-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()Xi Wang2011-12-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will cause a kernel oops due to insufficient bounds checking. if (count > 1<<30) { /* Enforce a limit to prevent overflow */ return -EINVAL; } count = roundup_pow_of_two(count); table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count)); Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as: ... + (count * sizeof(struct rps_dev_flow)) where sizeof(struct rps_dev_flow) is 8. (1 << 30) * 8 will overflow 32 bits. This patch replaces the magic number (1 << 30) with a symbolic bound. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: introduce DST_NOPEER dst flagEric Dumazet2011-12-224-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud