blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	[TCP]: Do TSO deferral even if tail SKB can go out now.	Dmitry Yusupov	2005-08-23	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	If the tail SKB fits into the window, it is still benefitical to defer until the goal percentage of the window is available. This give the application time to feed more data into the send queue and thus results in larger TSO frames going out. Patch from Dmitry Yusupov <dima@neterion.com>. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864	Herbert Xu	2005-08-16	1	-19/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) We send out a normal sized packet with TSO on to start off. 2) ICMP is received indicating a smaller MTU. 3) We send the current sk_send_head which needs to be fragmented since it was created before the ICMP event. The first fragment is then sent out. At this point the remaining fragment is allocated by tcp_fragment. However, its size is padded to fit the L1 cache-line size therefore creating tail-room up to 124 bytes long. This fragment will also be sitting at sk_send_head. 4) tcp_sendmsg is called again and it stores data in the tail-room of of the fragment. 5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment since the packet as a whole exceeds the MTU. At this point we have a packet that has data in the head area being fed to tso_fragment which bombs out. My take on this is that we shouldn't ever call tcp_fragment on a TSO socket for a packet that is yet to be transmitted since this creates a packet on sk_send_head that cannot be extended. So here is a patch to change it so that tso_fragment is always used in this case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()	Herbert Xu	2005-08-10	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Well I've only found one potential cause for the assertion failure in tcp_mark_head_lost. First of all, this can only occur if cnt > 1 since tp->packets_out is never zero here. If it did hit zero we'd have much bigger problems. So cnt is equal to fackets_out - reordering. Normally fackets_out is less than packets_out. The only reason I've found that might cause fackets_out to exceed packets_out is if tcp_fragment is called from tcp_retransmit_skb with a TSO skb and the current MSS is greater than the MSS stored in the TSO skb. This might occur as the result of an expiring dst entry. In that case, packets_out may decrease (line 1380-1381 in tcp_output.c). However, fackets_out is unchanged which means that it may in fact exceed packets_out. Previously tcp_retrans_try_collapse was the only place where packets_out can go down and it takes care of this by decrementing fackets_out. So we should make sure that fackets_out is reduced by an appropriate amount here as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[PATCH] tcp: fix TSO cwnd caching bug	Herbert Xu	2005-08-04	1	-25/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_write_xmit caches the cwnd value indirectly in cwnd_quota. When tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached value becomes invalid. This patch ensures that the cwnd value is always reread after each tcp_transmit_skb call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] tcp: fix TSO sizing bugs	David S. Miller	2005-08-04	1	-28/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MSS changes can be lost since we preemptively initialize the tso_segs count for an SKB before we %100 commit to sending it out. So, by the time we send it out, the tso_size information can be stale due to PMTU events. This mucks up all of the logic in our send engine, and can even result in the BUG() triggering in tcp_tso_should_defer(). Another problem we have is that we're storing the tp->mss_cache, not the SACK block normalized MSS, as the tso_size. That's wrong too. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[NET]: Fix sparse warnings	Victor Fusco	2005-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Never TSO defer under periods of congestion.	David S. Miller	2005-07-05	1	-0/+3
\| \| \| \| \| \| \| \| \|	Congestion window recover after loss depends upon the fact that if we have a full MSS sized frame at the head of the send queue, we will send it. TSO deferral can defeat the ACK clocking necessary to exit cleanly from recovery. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Move to new TSO segmenting scheme.	David S. Miller	2005-07-05	1	-219/+359
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Eliminate redundant computations in tcp_write_xmit().	David S. Miller	2005-07-05	1	-9/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_snd_test() is run for every packet output by a single call to tcp_write_xmit(), but this is not necessary. For one, the congestion window space needs to only be calculated one time, then used throughout the duration of the loop. This cleanup also makes experimenting with different TSO packetization schemes much easier. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Break out tcp_snd_test() into it's constituent parts.	David S. Miller	2005-07-05	1	-38/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_snd_test() does several different things, use inline functions to express this more clearly. 1) It initializes the TSO count of SKB, if necessary. 2) It performs the Nagle test. 3) It makes sure the congestion window is adhered to. 4) It makes sure SKB fits into the send window. This cleanup also sets things up so that things like the available packets in the congestion window does not need to be calculated multiple times by packet sending loops such as tcp_write_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.	David S. Miller	2005-07-05	1	-14/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Fix redundant calculations of tcp_current_mss()	David S. Miller	2005-07-05	1	-12/+4
\| \| \| \| \| \| \| \| \| \| \|	tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: tcp_write_xmit() tabbing cleanup	David S. Miller	2005-07-05	1	-34/+34
\| \| \| \| \| \| \|	Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().	David S. Miller	2005-07-05	1	-30/+49
\| \| \| \| \| \| \| \| \| \| \|	The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Add missing skb_header_release() call to tcp_fragment().	David S. Miller	2005-07-05	1	-0/+1
\| \| \| \| \| \| \| \|	When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Move __tcp_data_snd_check into tcp_output.c	David S. Miller	2005-07-05	1	-0/+10
\| \| \| \| \| \| \| \|	It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Move send test logic out of net/tcp.h	David S. Miller	2005-07-05	1	-21/+129
\| \| \| \| \| \| \| \| \| \| \| \|	This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Fix quick-ack decrementing with TSO.	David S. Miller	2005-07-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Add pluggable congestion control algorithm infrastructure.	Stephen Hemminger	2005-06-23	1	-15/+8
\| \| \| \| \| \| \| \| \| \|	Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET] Rename open_request to request_sock	Arnaldo Carvalho de Melo	2005-06-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ok, this one just renames some stuff to have a better namespace and to dissassociate it from TCP: struct open_request -> struct request_sock tcp_openreq_alloc -> reqsk_alloc tcp_openreq_free -> reqsk_free tcp_openreq_fastfree -> __reqsk_free With this most of the infrastructure closely resembles a struct sock methods subset. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET] Generalise TCP's struct open_request minisock infrastructure	Arnaldo Carvalho de Melo	2005-06-18	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kept this first changeset minimal, without changing existing names to ease peer review. Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn has two new members: ->slab, that replaces tcp_openreq_cachep ->obj_size, to inform the size of the openreq descendant for a specific protocol The protocol specific fields in struct open_request were moved to a class hierarchy, with the things that are common to all connection oriented PF_INET protocols in struct inet_request_sock, the TCP ones in tcp_request_sock, that is an inet_request_sock, that is an open_request. I.e. this uses the same approach used for the struct sock class hierarchy, with sk_prot indicating if the protocol wants to use the open_request infrastructure by filling in sk_prot->rsk_prot with an or_calltable. Results? Performance is improved and TCP v4 now uses only 64 bytes per open request minisock, down from 96 without this patch :-) Next changeset will rename some of the structs, fields and functions mentioned above, struct or_calltable is way unclear, better name it struct request_sock_ops, s/struct open_request/struct request_sock/g, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[PATCH] update Ross Biro bouncing email address	Jesper Juhl	2005-05-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[TCP]: skb pcount with MTU discovery	David S. Miller	2005-04-24	1	-11/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is that when doing MTU discovery, the too-large segments in the write queue will be calculated as having a pcount of >1. When tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test when pcount > cwnd. The segments are eventually transmitted one at a time by keepalive, but this can take a long time. This patch checks if TSO is enabled when setting pcount. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Linux-2.6.12-rc2v2.6.12-rc2	Linus Torvalds	2005-04-16	1	-0/+1739
	Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!