summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* rfkill: add parameter to disable radios by defaultHenrique de Moraes Holschuh2008-06-261-1/+10
| | | | | | | | | | | | | | | | | Currently, radios are always enabled when their rfkill interface is registered. This is not optimal, the safest state for a radio is to be offline unless the user turns it on. Add a module parameter that causes all radios to be disabled when their rfkill interface is registered. The module default is not changed so unless the parameter is used, radios will still be forced to their enabled state when they are registered. The new rfkill module parameter is called "default_state". Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* rfkill: handle SW_RFKILL_ALL eventsHenrique de Moraes Holschuh2008-06-261-2/+43
| | | | | | | | | | | | | Teach rfkill-input how to handle SW_RFKILL_ALL events (new name for the SW_RADIO event). SW_RFKILL_ALL is an absolute enable-or-disable command that is tied to all radios in a system. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* rfkill: fix minor typo in kernel docHenrique de Moraes Holschuh2008-06-261-1/+1
| | | | | | | | | Fix a minor typo in an exported function documentation Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* rfkill: clarify meaning of rfkill statesHenrique de Moraes Holschuh2008-06-262-3/+10
| | | | | | | | | | | | | | | rfkill really should have been named rfswitch. As it is, one can get confused whether RFKILL_STATE_ON means the KILL switch is on (and therefore, the radio is being *blocked* from operating), or whether it means the RADIO rf output is on. Clearly state that RFKILL_STATE_ON means the radio is *unblocked* from operating (i.e. there is no rf killing going on). Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/wireless-2.6John W. Linville2008-06-2522-641/+739
|\
| * wext: Emit event stream entries correctly when compat.David S. Miller2008-06-1618-308/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three major portions to this change: 1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF, and IW_EV_COMPAT_POINT_LEN helper defines. 2) Delete iw_stream_check_add_*(), they are unused. 3) Add iw_request_info argument to iwe_stream_add_*(), and use it to size the event and pointer lengths correctly depending upon whether IW_REQUEST_FLAG_COMPAT is set or not. 4) The mechanical transformations to the drivers and wireless stack bits to get the iw_request_info passed down into the routines modified in #3. Also, explicit references to IW_EV_LCP_LEN are replaced with iwe_stream_lcp_len(info). With a lot of help and bug fixes from Masakazu Mokuno. Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Create IW_REQUEST_FLAG_COMPAT and set it as needed.David S. Miller2008-06-162-41/+34
| | | | | | | | | | | | | | Now low-level WEXT ioctl handlers can do compat handling when necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Remove compat handling from fs/compat_ioctl.cDavid S. Miller2008-06-161-106/+1
| | | | | | | | | | | | No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.cDavid S. Miller2008-06-165-6/+134
| | | | | | | | | | | | | | | | | | Next we can kill the hacks in fs/compat_ioctl.c and also dispatch compat ioctls down into the driver and 80211 protocol helper layers in order to handle iw_point objects embedded in stream replies which need to be translated. Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Pull top-level ioctl dispatch logic into helper function.David S. Miller2008-06-161-6/+20
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Pass iwreq pointer down into standard/private handlers.David S. Miller2008-06-161-9/+8
| | | | | | | | | | | | They have no need to see the object as an ifreq. Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Parameterize the standard/private handlers.David S. Miller2008-06-161-8/+16
| | | | | | | | | | | | | | The WEXT standard and private handlers to use are now arguments to wireless_process_ioctl(). Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Pull ioctl permission checking out into helper function.David S. Miller2008-06-161-7/+15
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Extract private call iw_point handling into seperate functions.David S. Miller2008-06-161-67/+74
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Extract standard call iw_point handling into seperate function.David S. Miller2008-06-161-124/+134
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Make adjust_priv_size() take a "struct iw_point *".David S. Miller2008-06-161-3/+3
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * wext: Remove inline from get_priv_size() and adjust_priv_size().David S. Miller2008-06-161-3/+2
| | | | | | | | | | | | The compiler inlines when appropriate. Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Update versionEilon Greenstein2008-06-231-2/+2
| | | | | | | | | | | | | | Updating to version 1.45.6 Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Add PCIE EEH supportWendy Xiong2008-06-231-6/+95
| | | | | | | | | | | | | | | | | | Add PCI recovery functions to the driver. The initial PCI state is also saved so the MSI state can be restored during PCI recovery. Signed-off-by: Wendy Xiong <wendyx@us.ibm.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Enhanced self testYitchak Gertner2008-06-231-9/+487
| | | | | | | | | | | | | | | | | | Added registers, memories, loopback, nvram, interrupt and link tests to the self-test Signed-off-by: Yitchak Gertner <gertner@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Re-factor Tx codeEilon Greenstein2008-06-231-140/+304
| | | | | | | | | | | | | | | | | | | | Add support for IPv6 TSO Re-factor the Tx code with smaller functions to increase readability. Add linearization code in case packet is too fragmented for the microcode to handle. Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Add TPA, Broadcoms HW LROVladislav Zolotarov2008-06-232-136/+884
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TPA stands for Transparent Packet Aggregation. When enabled, the FW aggregate in-order TCP packets according to the 4-tuple match and sends 1 big packet to the driver. This packet is stored on an SGL in which each SGE is 1 page. The FW also implements a timeout algorithm and it honors all TCP flag, including the push flag as a trigger to halt aggregation. After receiving Ben Hutchings comments, we also added ethtool support, so now, thanks to Ben's patch, when forwarding is enabled, our aggregation is turned off using the LRO flags. Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: New statistics codeYitchak Gertner2008-06-233-850/+1480
| | | | | | | | | | | | | | | | | | | | To avoid race conditions with link up/down and driver up/down - the statistics handling was re-written in a form of state machine. Also supporting statistics for 57711 Signed-off-by: Yitchak Gertner <gertner@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Add support for BCM57711 HWEilon Greenstein2008-06-239-2153/+3752
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supporting the 57711 and 57711E - refers to in the code as E1H. The 57710 is referred to as E1. To support the new members in the family, the bnx2x structure was divided to 3 parts: common, port and function. These changes caused some rearrangement in the bnx2x.h file. A set of accessories macros were added to make access to the bnx2x structure more readable Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: New microcode part 3/3Eilon Greenstein2008-06-231-0/+4688
| | | | | | | | | | | | | | | | The new Microcode BLOB - broken into a separate patch to make it small enough for the mailing list Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: New microcode part 2/3Eilon Greenstein2008-06-231-0/+3382
| | | | | | | | | | | | | | | | The new Microcode BLOB - broken into a separate patch to make it small enough for the mailing list Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: New microcode part 1/3Eilon Greenstein2008-06-231-250/+4354
| | | | | | | | | | | | | | | | The new Microcode BLOB - broken into a separate patch to make it small enough for the mailing list Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Remove old microcodeEilon Greenstein2008-06-231-4900/+1
| | | | | | | | | | | | | | | | Removing the old Microcode from the BLOB - broken into a separate patch to make it small enough for the mailing list Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: New init infrastructureEilon Greenstein2008-06-235-656/+2130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new initialization code supports the 57711 HW. It also supports the emulation and FPGA for the 57711 and 57710 initializations values (very small amount of code which is very helpful in the lab - less than 30 lines). The initialization is done via DMAE after the DMAE block is ready - before it is ready, some of the initialization is done via PCI configuration transactions (referred to as indirect write). A mutex to protect the DMAE from being overlapped was added. There are few new registers which needs to be initialized by SW - the full comment for those registers is added to the register file. A place holder for the 57711 (referred to as E1H) microcode was added- the microcode itself is too big and it is split over the following 4 patches Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: New link codeYaniv Rosner2008-06-235-3399/+1826
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New Link code: Moving all the link related code (including the calculations, the initialization of the MAC and PHY and the external PHY's code) into a separated file. The changes from the code that used to be part of bnx2x.c (now called bnx2x_main.c) are: - Using separate structures for link inputs and link outputs to clearly identify what was configured and what is the outcome - Adding code to read external PHY FW version and print it as part of ethtool -i - Adding code to upgrade external PHY FW from ethtool -E with special magic number - Changing the link down indication to ERR level - Adding a lock on all PHY access to prevent an interrupt and setting changes to overlap - Adding support for emulation and FPGA (small chunk of code that really helps in the lab) - Adding support for 1G on BCM8706 PHY - Adding clear debug print incase of fan failure (the PHY type is now "failure") Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Adding bnx2x_linkYaniv Rosner2008-06-232-0/+4686
| | | | | | | | | | | | | | | | | | | | This patch is int the new bnx2x_link files (C and H). The files are still not used in this patch, only in the next one so the patch will be small enough for the mailing list. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilong Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: Rename bnx2x.c to bnx2x_main.cEilon Greenstein2008-06-232-0/+1
| | | | | | | | | | | | | | This patch is the rename of bnx2x.c to bnx2x_main.c. Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: Kill unused variable in sctp_assoc_bh_rcv()Vlad Yasevich2008-06-201-1/+0
| | | | | | | | | | Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Update driver version to 1.7.7.Michael Chan2008-06-191-3/+3
| | | | | | | | | | | | | | | | And update module description. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Cleanup error handling in bnx2_open().Michael Chan2008-06-191-23/+14
| | | | | | | | | | | | | | | | All error handling in bnx2_open() can be consolidated. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Turn on multi rx rings.Michael Chan2008-06-192-17/+80
| | | | | | | | | | | | | | | | | | Enable multiple rx rings if MSI-X vectors are available. We enable up to 7 rx rings. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Update firmware to support multi rx rings.Michael Chan2008-06-191-4419/+4437
| | | | | | | | | | | | Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Use one handler for all MSI-X vectors.Michael Chan2008-06-191-59/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the same MSI-X handler to schedule NAPI. Change the dev_instance void pointer to the bnx2_napi struct instead so we can have the proper context for each MSI-X vector. Add a new bnx2_poll_msix() that is optimized for handling MSI-X NAPI polling of rx/tx work only. Remove the old bnx2_tx_poll() that is no longer needed. Each MSI-X vector handles 1 tx and 1 rx ring. The first vector handles link events as well. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Optimize fast-path tx and rx work.Michael Chan2008-06-192-44/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add hw_tx_cons_ptr and hw_rx_cons_ptr to speed up the retreival of the tx and rx consumer index, since the MSI-X and default status blocks have different structures. Combine status_blk and status_blk_msix into a union. We'll only use one type of status block for each vector. Separate the code to detect more rx and tx work from the code to detect link related work. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Put rx ring variables in a separate struct.Michael Chan2008-06-192-169/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for multi-ring support, rx ring variables are now put in a separate bnx2_rx_ring_info struct. With MSI-X, we can support multiple rx rings. The functions to allocate/free rx memory and to initialize rx rings are now modified to handle multiple rings. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2: Put tx ring variables in a separate struct.Michael Chan2008-06-192-129/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for multi-ring support, tx ring variables are now put in a separate bnx2_tx_ring_info struct. Multi tx ring will not be enabled until it is fully supported by the stack. Only 1 tx ring will be used at the moment. The functions to allocate/free tx memory and to initialize tx rings are now modified to handle multiple rings. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Discard and warn about LRO'd skbs received for forwardingBen Hutchings2008-06-195-1/+29
| | | | | | | | | | | | | | | | | | | | | | Add skb_warn_if_lro() to test whether an skb was received with LRO and warn if so. Change br_forward(), ip_forward() and ip6_forward() to call it) and discard the skb if it returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Disable LRO on devices that are forwardingBen Hutchings2008-06-195-5/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Large Receive Offload (LRO) is only appropriate for packets that are destined for the host, and should be disabled if received packets may be forwarded. It can also confuse the GSO on output. Add dev_disable_lro() function which uses the appropriate ethtool ops to disable LRO if enabled. Add calls to dev_disable_lro() in br_add_if() and functions that enable IPv4 and IPv6 forwarding. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: Follow security requirement of responding with 1 packetVlad Yasevich2008-06-196-33/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts When an SCTP stack receives a packet containing multiple control or DATA chunks and the processing of the packet requires the sending of multiple chunks in response, the sender of the response chunk(s) MUST NOT send more than one packet. If bundling is supported, multiple response chunks that fit into a single packet MAY be bundled together into one single response packet. If bundling is not supported, then the sender MUST NOT send more than one response chunk and MUST discard all other responses. Note that this rule does NOT apply to a SACK chunk, since a SACK chunk is, in itself, a response to DATA and a SACK does not require a response of more DATA. We implement this by not servicing our outqueue until we reach the end of the packet. This enables maximum bundling. We also identify 'response' chunks and make sure that we only send 1 packet when sending such chunks. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: Validate Initiate Tag when handling ICMP messageWei Yongjun2008-06-191-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add to validate initiate tag and chunk type if verification tag is 0 when handling ICMP message. RFC 4960, Appendix C. ICMP Handling ICMP6) An implementation MUST validate that the Verification Tag contained in the ICMP message matches the Verification Tag of the peer. If the Verification Tag is not 0 and does NOT match, discard the ICMP message. If it is 0 and the ICMP message contains enough bytes to verify that the chunk type is an INIT chunk and that the Initiate Tag matches the tag of the peer, continue with ICMP7. If the ICMP message is too short or the chunk type or the Initiate Tag does not match, silently discard the packet. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2008-06-1920-272/+307
|\ \ | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/tx.c
| * | mac80211: detect driver tx bugsJohannes Berg2008-06-181-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a driver rejects a frame in it's ->tx() callback, it must also stop queues, otherwise mac80211 can go into a loop here. Detect this situation and abort the loop after five retries, warning about the driver bug. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netlink: genl: fix circular lockingPatrick McHardy2008-06-181-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | genetlink has a circular locking dependency when dumping the registered families: - dump start: genl_rcv() : take genl_mutex genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex netlink_dump_start(), netlink_dump() : take nlk->cb_mutex ctrl_dumpfamily() : try to detect this case and not take genl_mutex a second time - dump continuance: netlink_rcv() : call netlink_dump netlink_dump : take nlk->cb_mutex ctrl_dumpfamily() : take genl_mutex Register genl_lock as callback mutex with netlink to fix this. This slightly widens an already existing module unload race, the genl ops used during the dump might go away when the module is unloaded. Thomas Graf is working on a seperate fix for this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Revert "mac80211: Use skb_header_cloned() on TX path."David S. Miller2008-06-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 608961a5eca8d3c6bd07172febc27b5559408c5d. The problem is that the mac80211 stack not only needs to be able to muck with the link-level headers, it also might need to mangle all of the packet data if doing sw wireless encryption. This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for bringing this bisection analysis to my attention), and Ilpo (for trying to analyze this purely from the TCP side). In 2.6.27 we can take another stab at this, by using something like skb_cow_data() when the TX path of mac80211 ends up with a non-NULL tx->key. The ESP protocol code in the IPSEC stack can be used as a model for implementation. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | af_unix: fix 'poll for write'/ connected DGRAM socketsRainer Weikusat2008-06-171-9/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unix_dgram_sendmsg routine implements a (somewhat crude) form of receiver-imposed flow control by comparing the length of the receive queue of the 'peer socket' with the max_ack_backlog value stored in the corresponding sock structure, either blocking the thread which caused the send-routine to be called or returning EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET sockets. The poll-implementation for these socket types is datagram_poll from core/datagram.c. A socket is deemed to be writeable by this routine when the memory presently consumed by datagrams owned by it is less than the configured socket send buffer size. This is always wrong for connected PF_UNIX non-stream sockets when the abovementioned receive queue is currently considered to be full. 'poll' will then return, indicating that the socket is writeable, but a subsequent write result in EAGAIN, effectively causing an (usual) application to 'poll for writeability by repeated send request with O_NONBLOCK set' until it has consumed its time quantum. The change below uses a suitably modified variant of the datagram_poll routines for both type of PF_UNIX sockets, which tests if the recv-queue of the peer a socket is connected to is presently considered to be 'full' as part of the 'is this socket writeable'-checking code. The socket being polled is additionally put onto the peer_wait wait queue associated with its peer, because the unix_dgram_sendmsg routine does a wake up on this queue after a datagram was received and the 'other wakeup call' is done implicitly as part of skb destruction, meaning, a process blocked in poll because of a full peer receive queue could otherwise sleep forever if no datagram owned by its socket was already sitting on this queue. Among this change is a small (inline) helper routine named 'unix_recvq_full', which consolidates the actual testing code (in three different places) into a single location. Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud