summaryrefslogtreecommitdiffstats
path: root/net/ipv4/ipvs
Commit message (Collapse)AuthorAgeFilesLines
* [IPVS]: Fix sysctl warnings about missing strategyChristian Borntraeger2007-11-191-4/+0
| | | | | | | | | | | | | | | | | | | | | | Running the latest git code I get the following messages during boot: sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy I removed the binary sysctl handler for those messages and also removed the definitions in ip_vs.h. The alternative would be to implement a proper strategy handler, but syscall sysctl is deprecated. There are other sysctl definitions that are commented out or work with the default sysctl_data strategy. I did not touch these. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Remove unused exports.Adrian Bunk2007-11-122-2/+0
| | | | | | | | | This patch removes the following unused EXPORT_SYMBOL's: - ip_vs_try_bind_dest - ip_vs_find_dest Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Synchronize closing of ConnectionsRumen G. Bogdanovski2007-11-072-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes the master daemon to sync the connection when it is about to close. This makes the connections on the backup to close or timeout according their state. Before the sync was performed only if the connection is in ESTABLISHED state which always made the connections to timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch ([IPVS]: use proper timeout instead of fixed value) effectively did nothing more than increasing this to 15 minutes (Established state timeout). So this patch makes use of proper timeout since it syncs the connections on status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout). However if the backup misses CLOSE hopefully it did not miss FIN_WAIT. Otherwise we will just have to wait for the ESTABLISHED state timeout. As it is without this patch. This way the number of the hanging connections on the backup is kept to minimum. And very few of them will be left to timeout with a long timeout. This is important if we want to make use of the fix for the real server overcommit on master/backup fail-over. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Bind connections on stanby if the destination existsRumen G. Bogdanovski2007-11-073-4/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the problem with node overload on director fail-over. Given the scenario: 2 nodes each accepting 3 connections at a time and 2 directors, director failover occurs when the nodes are fully loaded (6 connections to the cluster) in this case the new director will assign another 6 connections to the cluster, If the same real servers exist there. The problem turned to be in not binding the inherited connections to the real servers (destinations) on the backup director. Therefore: "ipvsadm -l" reports 0 connections: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 0 -> node484.local:5999 Route 1000 0 0 while "ipvs -lnc" is right root@test2:~# ipvsadm -lnc IPVS connection entries pro expire state source virtual destination TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999 192.168.0.51:5999 TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999 192.168.0.52:5999 So the patch I am sending fixes the problem by binding the received connections to the appropriate service on the backup director, if it exists, else the connection will be handled the old way. So if the master and the backup directors are synchronized in terms of real services there will be no problem with server over-committing since new connections will not be created on the nonexistent real services on the backup. However if the service is created later on the backup, the binding will be performed when the next connection update is received. With this patch the inherited connections will show as inactive on the backup: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 1 -> node484.local:5999 Route 1000 0 1 rumen@test2:~$ cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP C0A800DE:176F wlc -> C0A80033:176F Route 1000 0 1 -> C0A80032:176F Route 1000 0 1 Regards, Rumen Bogdanovski Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* [IPVS]: Remove /proc/net/ip_vs_lblcrAlexey Dobriyan2007-10-301-76/+0
| | | | | | | It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: use proper timeout instead of fixed valueAndy Gospodarek2007-10-291-2/+3
| | | | | | | | | | | | Instead of using the default timeout of 3 minutes, this uses the timeout specific to the protocol used for the connection. The 3 minute timeout seems somewhat arbitrary (though I know it is used other places in the ipvs code) and when failing over it would be much nicer to use one of the configured timeout values. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Treat the sign of the result of skb_headroom() consistentlyChuck Lever2007-10-231-1/+1
| | | | | | | | | In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Use task_pid_nr() in ip_vs_sync.cPavel Emelyanov2007-10-191-1/+1
| | | | | | | | | | | The sync_master_pid and sync_backup_pid are set in set_sync_pid() and are used later for set/not-set checks and in printk. So it is safe to use the global pid value in this case. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use helpers to obtain task pid in printksPavel Emelyanov2007-10-191-2/+2
| | | | | | | | | | | | | | | | The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [NETFILTER]: Replace sk_buff ** with sk_buff *Herbert Xu2007-10-156-102/+86
| | | | | | | | | With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Replace local version of skb_make_writableHerbert Xu2007-10-156-50/+16
| | | | | | | | This patch removes the IPVS-specific version of skb_make_writable and replaces it with the netfilter one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Change ip_defrag to return an integerHerbert Xu2007-10-151-16/+10
| | | | | | | | | Now that ip_frag always returns the packet given to it on input, we can change it to return an integer indicating error instead. This patch does that and updates all its callers accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make core networking code use seq_open_privatePavel Emelyanov2007-10-101-18/+2
| | | | | | | | | | | This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_devEric W. Biederman2007-10-101-1/+1
| | | | | | | | | | Now that multiple loopback devices are becoming possible it makes the code a little cleaner and more maintainable to test if a deivice is th a loopback device by testing dev->flags & IFF_LOOPBACK instead of dev == loopback_dev. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Dynamically allocate the loopback device, part 1.Daniel Lezcano2007-10-101-1/+1
| | | | | | | | | | | | | This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make the device list and device lookups per namespace.Eric W. Biederman2007-10-101-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make /proc/net per network namespaceEric W. Biederman2007-10-104-10/+14
| | | | | | | | | | | | | | | | | | This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Fix/improve deadlock condition on module removal netfilterNeil Horman2007-09-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Use IP_VS_WAIT_WHILE when encessary.Heiko Carstens2007-08-131-1/+1
| | | | | | | | | | | | | | For architectures that don't have a volatile atomic_ts constructs like while (atomic_read(&something)); might result in endless loops since a barrier() is missing which forces the compiler to generate code that actually reads memory contents. Fix this in ipvs by using the IP_VS_WAIT_WHILE macro which resolves to while (expr) { cpu_relax(); } (why isn't this open coded btw?) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Clean up duplicate includes in net/ipv4/Jesper Juhl2007-08-131-1/+0
| | | | | | | | | This patch cleans up duplicate includes in net/ipv4/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Use skb_forward_csumHerbert Xu2007-07-311-1/+1
| | | | | | | | | As a path that forwards packets, IPVS should be using skb_forward_csum instead of directly setting ip_summed to CHECKSUM_NONE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* mm: Remove slab destructors from kmem_cache_create().Paul Mundt2007-07-201-1/+1
| | | | | | | | | | | | | | Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* [NET]: Make all initialized struct seq_operations const.Philippe De Muyter2007-07-103-3/+3
| | | | | | | Make all initialized struct seq_operations in net/ const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Fix state variable on failure to start ipvs threadsNeil Horman2007-06-181-2/+39
| | | | | | | | | | | | | | | | | | | | | ip_vs currently fails to reset its ip_vs_sync_state variable if the sync thread fails to start properly. The result is that the kernel will report a running daemon when their actuall is none. If you issue the following commands: 1. ipvsadm --start-daemon master --mcast-interface bla 2. ipvsadm -L --daemon 3. ipvsadm --stop-daemon master Assuming that bla is not an actual interface, step 2 should return no data, but instead returns: $ ipvsadm -L --daemon master sync daemon (mcast=bla, syncid=0) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Use menuconfig objects.Jan Engelhardt2007-05-241-25/+5
| | | | | | | | | | Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds2007-05-091-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.
| * Fix occurrences of "the the "Michael Opdenacker2007-05-091-1/+1
| | | | | | | | | | Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* | unify flush_work/flush_work_keventd and rename it to cancel_work_syncOleg Nesterov2007-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ipvs: flush defense_work before module unloadOleg Nesterov2007-05-091-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | net/ipv4/ipvs/ip_vs_core.c module_exit ip_vs_cleanup ip_vs_control_cleanup cancel_rearming_delayed_work // done This is unsafe. The module may be unloaded and the memory may be freed while defense_work's handler is still running/preempted. Do flush_work(&defense_work.work) after cancel_rearming_delayed_work(). Alternatively, we could add flush_work() to cancel_rearming_delayed_work(), but note that we can't change cancel_delayed_work() in the same manner because it may be called from atomic context. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARYHerbert Xu2007-04-251-4/+2
| | | | | | | | | When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Introduce skb_copy_to_linear_data{_offset}Arnaldo Carvalho de Melo2007-04-251-1/+1
| | | | | | | To clearly state the intent of copying to linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
* [SK_BUFF]: Convert skb->tail to sk_buff_data_tArnaldo Carvalho de Melo2007-04-251-2/+2
| | | | | | | | | | | | | | | So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit ↵Arnaldo Carvalho de Melo2007-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | architectures With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used in further shrinking work, likely with the offsetization of other pointers, such as ->{data,tail,end}, at the cost of adds, that were minimized by the usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then accessed multiple times in each function, it also is not more expensive than before with regards to most of the handling of such headers, like setting one of these headers to another (transport to network, etc), or subtracting, adding to/from it, comparing them, etc. Now we have this layout for sk_buff on a x86_64 machine: [acme@mica net-2.6.22]$ pahole vmlinux sk_buff struct sk_buff { struct sk_buff * next; /* 0 8 */ struct sk_buff * prev; /* 8 8 */ struct rb_node rb; /* 16 24 */ struct sock * sk; /* 40 8 */ ktime_t tstamp; /* 48 8 */ struct net_device * dev; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct net_device * input_dev; /* 64 8 */ sk_buff_data_t transport_header; /* 72 4 */ sk_buff_data_t network_header; /* 76 4 */ sk_buff_data_t mac_header; /* 80 4 */ /* XXX 4 bytes hole, try to pack */ struct dst_entry * dst; /* 88 8 */ struct sec_path * sp; /* 96 8 */ char cb[48]; /* 104 48 */ /* cacheline 2 boundary (128 bytes) was 24 bytes ago*/ unsigned int len; /* 152 4 */ unsigned int data_len; /* 156 4 */ unsigned int mac_len; /* 160 4 */ union { __wsum csum; /* 4 */ __u32 csum_offset; /* 4 */ }; /* 164 4 */ __u32 priority; /* 168 4 */ __u8 local_df:1; /* 172 1 */ __u8 cloned:1; /* 172 1 */ __u8 ip_summed:2; /* 172 1 */ __u8 nohdr:1; /* 172 1 */ __u8 nfctinfo:3; /* 172 1 */ __u8 pkt_type:3; /* 173 1 */ __u8 fclone:2; /* 173 1 */ __u8 ipvs_property:1; /* 173 1 */ /* XXX 2 bits hole, try to pack */ __be16 protocol; /* 174 2 */ void (*destructor)(struct sk_buff *); /* 176 8 */ struct nf_conntrack * nfct; /* 184 8 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct sk_buff * nfct_reasm; /* 192 8 */ struct nf_bridge_info *nf_bridge; /* 200 8 */ __u16 tc_index; /* 208 2 */ __u16 tc_verd; /* 210 2 */ dma_cookie_t dma_cookie; /* 212 4 */ __u32 secmark; /* 216 4 */ __u32 mark; /* 220 4 */ unsigned int truesize; /* 224 4 */ atomic_t users; /* 228 4 */ unsigned char * head; /* 232 8 */ unsigned char * data; /* 240 8 */ unsigned char * tail; /* 248 8 */ /* --- cacheline 4 boundary (256 bytes) --- */ unsigned char * end; /* 256 8 */ }; /* size: 264, cachelines: 5 */ /* sum members: 260, holes: 1, sum holes: 4 */ /* bit holes: 1, sum bit holes: 2 bits */ /* last cacheline: 8 bytes */ On 32 bits nothing changes, and pointers continue to be used with the compiler turning all this abstraction layer into dust. But there are some sk_buff validation tricks that are now possible, humm... :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: unions of just one member don't get anything done, kill themArnaldo Carvalho de Melo2007-04-251-2/+2
| | | | | | | | | Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: More skb_put related skb_reset_transport_headerArnaldo Carvalho de Melo2007-04-251-1/+2
| | | | | | | | | This time we have to set it to skb->tail that is not anymore equal to skb->data, so we either add a new helper or just add the skb->tail - skb->data offset, for now do the later. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iphArnaldo Carvalho de Melo2007-04-2510-53/+51
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IP]: Introduce ip_hdrlen()Arnaldo Carvalho de Melo2007-04-254-17/+14
| | | | | | | | | | | | | | | | For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl*4 (skb->nh.iph)->ihl * sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Introduce skb_network_header()Arnaldo Carvalho de Melo2007-04-252-3/+4
| | | | | | | | | For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Use skb_reset_network_header in skb_push casesArnaldo Carvalho de Melo2007-04-251-1/+2
| | | | | | | | skb_push updates and returns skb->data, so we can just call skb_reset_network_header after the call to skb_push. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] IPV4: Use hton{s,l}() where appropriate.YOSHIFUJI Hideaki2007-04-253-21/+21
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] sysctl: remove insert_at_head from register_sysctlEric W. Biederman2007-02-143-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] remove many unneeded #includes of sched.hTim Schmielau2007-02-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] mark struct file_operations const 7Arjan van de Ven2007-02-123-4/+4
| | | | | | | | | | | Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2007-02-116-27/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [IPV4]: Restore multipath routing after rt_next changes. [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. [NET]: Reorder fields of struct dst_entry [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer [NET]: Introduce union in struct dst_entry to hold 'next' pointer [DECNET]: fix misannotation of linkinfo_dn [DECNET]: FRA_{DST,SRC} are le16 for decnet [UDP]: UDP can use sk_hash to speedup lookups [NET]: Fix whitespace errors. [NET] XFRM: Fix whitespace errors. [NET] X25: Fix whitespace errors. [NET] WANROUTER: Fix whitespace errors. [NET] UNIX: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] SCHED: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. ...
| * [NET] IPV4: Fix whitespace errors.YOSHIFUJI Hideaki2007-02-106-27/+27
| | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().Robert P. J. Day2007-02-111-2/+1
|/ | | | | | | | | | | | | | | | | | | | | | Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [IPVS]: Make ip_vs_sync.c <= 80col wide.Simon Horman2006-12-111-3/+6
| | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Use msleep_interruptable() instead of ssleep() aka msleep()Simon Horman2006-12-111-4/+4
| | | | | | | | | | | | | | | | Dean Manners notices that when an IPVS synchonisation daemons are started the system load slowly climbs up to 1. This seems to be related to the call to ssleep(1) (aka msleep(1000) in the main loop. Replacing this with a call to msleep_interruptable() seems to make the problem go away. Though I'm not sure that it is correct. This is the second edition of this patch, which replaces ssleep() in the main loop for both the master and backup threads, as well as some thread synchronisation code. The latter is just for thorougness as it shouldn't be causing any problems. Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] slab: remove kmem_cache_tChristoph Lameter2006-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'master' of ↵David Howells2006-12-052-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>
OpenPOWER on IntegriCloud