summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | tcp: tcp_make_synack() should clear skb->tstampEric Dumazet2015-04-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed tcpdump was giving funky timestamps for locally generated SYNACK messages on loopback interface. 11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S 945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7> 20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S 3160535375:3160535375(0) ack 945476043 win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7> This is because we need to clear skb->tstamp before entering lower stack, otherwise net_timestamp_check() does not set skb->tstamp. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | udptunnels: Call handle_offloads after inserting vlan tag.Jesse Gross2015-04-092-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handle_offloads() calls skb_reset_inner_headers() to store the layer pointers to the encapsulated packet. However, we currently push the vlag tag (if there is one) onto the packet afterwards. This changes the MAC header for the encapsulated packet but it is not reflected in skb->inner_mac_header, which breaks GSO and drivers which attempt to use this for encapsulation offloads. Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.") Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Merge branch 'master' of ↵David S. Miller2015-04-092-7/+6
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-04-09 1) We dereferenced the xfrm outer_mode too early, larval SAs don't have it set. Move the dereference of the outer mode below the larval SA check to fix it. From Alexey Dobriyan. 2) Fix vti6 tunnel uninit on namespace crosssing. From Yao Xiwei. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | vti6: fix uninit when using x-netnsYao Xiwei2015-04-071-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the kernel deleted a vti6 interface, this interface was not removed from the tunnels list. Thus, when the ip6_vti module was removed, this old interface was found and the kernel tried to delete it again. This was leading to a kernel panic. Fixes: 61220ab34948 ("vti6: Enable namespace changing") Signed-off-by: Yao Xiwei <xiwei.yao@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | xfrm: fix xfrm_input/xfrm_tunnel_check oopsAlexey Dobriyan2015-04-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugzilla.kernel.org/show_bug.cgi?id=95211 Commit 70be6c91c86596ad2b60c73587880b47df170a41 ("xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer") added check which dereferences ->outer_mode too early but larval SAs don't have this pointer set (yet). So check for tunnel stuff later. Mike Noordermeer reported this bug and patiently applied all the debugging. Technically this is remote-oops-in-interrupt-context type of thing. BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0 ... [<ffffffff81500fc6>] ? xfrm4_esp_rcv+0x36/0x70 [<ffffffff814acc9a>] ? ip_local_deliver_finish+0x9a/0x200 [<ffffffff81471b83>] ? __netif_receive_skb_core+0x6f3/0x8f0 ... RIP [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0 Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | usbnet: rename work handlerOliver Neukum2015-04-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "kevent" is an extremely generic name that causes trouble if debugging for work queues is used. So change it to something clear. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | stmmac: devm_reset_control_get can return PROBE_DEFERPhil Reid2015-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In socfpga_dwmac_parse_data forward error code from devm_reset_control_get. This gives the driver another chance to laod if altr,rst-mgr is loaded after the network driver. Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | be2net: Fix a bug in Rx buffer postingAjit Khaparde2015-04-082-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The numPosted field in the ERX Doorbell register is 8-bits wide. So the max buffers that we can post at a time is 255 and not 256 which we are doing currently. Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | fou: Don't use const __read_mostlyAndi Kleen2015-04-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | const __read_mostly is a senseless combination. If something is already const it cannot be __read_mostly. Remove the bogus __read_mostly in the fou driver. This fixes section conflicts with LTO. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: phy: broadcom: Add BCM54616S phy entryAlessio Igor Bogani2015-04-083-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'rds'David S. Miller2015-04-084-8/+38
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sowmini Varadhan says: ==================== RDS: RDS-core fixes This patch-series updates the RDS core and rds-tcp modules with some bug fixes that were originally authored by Andy Grover, Zach Brown, and Chris Mason. v2: Code review comment by Sergei Shtylov V3: DaveM comments: - dropped patches 3, 5 for "heuristic" changes in rds_send_xmit(). Investigation into the root-cause of these IB-triggered changes produced the feedback: "I don't remember seeing "RDS: Stuck RM" message in last 1-1.5 years and checking with other folks. It may very well be some old workaround for stale connection for which long term fix is already made and this part of code not exercised anymore." Any such fixes, *if* they are needed, can/should be done in the IB specific RDS transport modules. - similarly dropped the LL_SEND_FULL patch (patch 6 in v2 set) v4: Documentation/networking/rds.txt contains incorrect references to "missing sysctl values for pf_rds and sol_rds in mainline". The sysctl values were never needed in mainline, thus fix the documentation. v5: Clarify comment per http://www.spinics.net/lists/netdev/msg324220.html v6: Re-added entire version history to cover letter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | RDS: make sure not to loop forever inside rds_send_xmitSowmini Varadhan2015-04-083-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a determined set of concurrent senders keep the send queue full, we can loop forever inside rds_send_xmit. This fix has two parts. First we are dropping out of the while(1) loop after we've processed a large batch of messages. Second we add a generation number that gets bumped each time the xmit bit lock is acquired. If someone else has jumped in and made progress in the queue, we skip our goto restart. Original patch by Chris Mason. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | RDS: only use passive connections when addresses matchSowmini Varadhan2015-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passive connections were added for the case where one loopback IB connection between identical addresses needs another connection to store the second QP. Unfortunately, they were also created in the case where the addesses differ and we already have both QPs. This lead to a message reordering bug. - two different IB interfaces and addresses on a machine: A B - traffic is sent from A to B - connection from A-B is created, connect request sent - listening accepts connect request, B-A is created - traffic flows, next_rx is incremented - unacked messages exist on the retrans list - connection A-B is shut down, new connect request sent - listen sees existing loopback B-A, creates new passive B-A - retrans messages are sent and delivered because of 0 next_rx The problem is that the second connection request saw the previously existing parent connection. Instead of using it, and using the existing next_rx_seq state for the traffic between those IPs, it mistakenly thought that it had to create a passive connection. We fix this by only using passive connections in the special case where laddr and faddr match. In this case we'll only ever have one parent sending connection requests and one passive connection created as the listening path sees the existing parent connection which initiated the request. Original patch by Zach Brown Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | RDS: Documentation: Document AF_RDS, PF_RDS and SOL_RDS correctly.Sowmini Varadhan2015-04-081-5/+4
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AF_RDS, PF_RDS and SOL_RDS are available in header files, and there is no need to get their values from /proc. Document this correctly. Fixes: 0c5f9b8830aa ("RDS: Documentation") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | netem: Fixes byte backlog accounting for the first of two chained netem ↵Beshay, Joseph2015-04-071-1/+2
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instances Fixes byte backlog accounting for the first of two chained netem instances. Bytes backlog reported now corresponds to the number of queued packets. When two netem instances are chained, for instance to apply rate and queue limitation followed by packet delay, the number of backlogged bytes reported by the first netem instance is wrong. It reports the sum of bytes in the queues of the first and second netem. The first netem reports the correct number of backlogged packets but not bytes. This is shown in the example below. Consider a chain of two netem schedulers created using the following commands: $ tc -s qdisc replace dev veth2 root handle 1:0 netem rate 10000kbit limit 100 $ tc -s qdisc add dev veth2 parent 1:0 handle 2: netem delay 50ms Start an iperf session to send packets out on the specified interface and monitor the backlog using tc: $ tc -s qdisc show dev veth2 Output using unpatched netem: qdisc netem 1: root refcnt 2 limit 100 rate 10000Kbit Sent 98422639 bytes 65434 pkt (dropped 123, overlimits 0 requeues 0) backlog 172694b 73p requeues 0 qdisc netem 2: parent 1: limit 1000 delay 50.0ms Sent 98422639 bytes 65434 pkt (dropped 0, overlimits 0 requeues 0) backlog 63588b 42p requeues 0 The interface used to produce this output has an MTU of 1500. The output for backlogged bytes behind netem 1 is 172694b. This value is not correct. Consider the total number of sent bytes and packets. By dividing the number of sent bytes by the number of sent packets, we get an average packet size of ~=1504. If we divide the number of backlogged bytes by packets, we get ~=2365. This is due to the first netem incorrectly counting the 63588b which are in netem 2's queue as being in its own queue. To verify this is the case, we subtract them from the reported value and divide by the number of packets as follows: 172694 - 63588 = 109106 bytes actualled backlogged in netem 1 109106 / 73 packets ~= 1494 bytes (which matches our MTU) The root cause is that the byte accounting is not done at the same time with packet accounting. The solution is to update the backlog value every time the packet queue is updated. Signed-off-by: Joseph D Beshay <joseph.beshay@utdallas.edu> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'cxgb4-next'David S. Miller2015-04-142-17/+62
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hariprasad Shenai says: ==================== cxgb4: Misc. fixes for sge Increases value of MAX_IMM_TX_PKT_LEN to improve latency, fill freelist starving threshold based on adapter type, add comments for tx flits and sge length code and don't call t4_slow_intr_handler when we are not master PF. This patch series has been created against net-next tree and includes patches on cxgb4 driver We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | cxgb4: Don't call t4_slow_intr_handler when we're not the Master PFHariprasad Shenai2015-04-142-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | cxgb4: Add comment for calculate tx flits and sge length codeHariprasad Shenai2015-04-141-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add comment for tx filt and sge length calucaltion code, also remove a hardcoded value Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | cxgb4: Use device node in page allocationHariprasad Shenai2015-04-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | cxgb4: Freelist starving threshold varies from adapter to adapterHariprasad Shenai2015-04-141-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fl_starv_thres could be different from adapter to adapter, don't use hardcoded values Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | cxgb4: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytesHariprasad Shenai2015-04-141-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows a significant latency drop for packets of sizes between 128 and 192 bytes Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: drop ring->num_slotsFelix Fietkau2015-04-142-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ring size is always known at compile time, so make the code a bit more efficient Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: fix DMA rx corruptionFelix Fietkau2015-04-141-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver needs to inform the hardware about the first invalid (not yet filled) rx slot, by writing its DMA descriptor pointer offset to the BGMAC_DMA_RX_INDEX register. This register was set to a value exceeding the rx ring size, effectively allowing the hardware constant access to the full ring, regardless of which slots are initialized. To fix this issue, always mark the last filled rx slot as invalid. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: simplify dma init/cleanupFelix Fietkau2015-04-141-39/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating buffers at device init time and initializing descriptors at device open, do both at the same time (during open). Free all buffers when closing the device. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: increase rx ring size from 511 to 512Felix Fietkau2015-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Limiting it to 511 looks like a failed attempt at leaving one descriptor empty to allow the hardware to stop processing a buffer that has not been prepared yet. However, this doesn't work because this affects the total ring size as well Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: add check for oversized packetsFelix Fietkau2015-04-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In very rare cases, the MAC can catch an internal buffer that is bigger than it's supposed to be. Instead of crashing the kernel, simply pass the buffer back to the hardware Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: simplify/optimize rx DMA error handlingFelix Fietkau2015-04-141-37/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate a new buffer before processing the completed one. If allocation fails, reuse the old buffer. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: set received skb headroom to NET_SKB_PADFelix Fietkau2015-04-142-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A packet buffer offset of 30 bytes is inefficient, because the first 2 bytes end up in a different cacheline. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: leave interrupts disabled as long as there is work to doFelix Fietkau2015-04-142-22/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always poll rx and tx during NAPI poll instead of relying on the status of the first interrupt. This prevents bgmac_poll from leaving unfinished work around until the next IRQ. In my tests this makes bridging/routing throughput under heavy load more stable and ensures that no new IRQs arrive as long as bgmac_poll uses up the entire budget. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bgmac: simplify tx ring index handlingFelix Fietkau2015-04-142-29/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep incrementing ring->start and ring->end instead of pointing it to the actual ring slot entry. This simplifies the calculation of the number of free slots. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | toshiba: Remove celleb from Kconfig optionsDaniel Axtens2015-04-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The toshiba drivers had celleb as an optional dependency. celleb has been dropped [1], so clean that out of Kconfig. [1] http://patchwork.ozlabs.org/patch/451730/ CC: netdev@vger.kernel.org CC: Valentin Rothberg <valentinrothberg@gmail.com> CC: mpe@ellerman.id.au CC: linuxppc-dev@lists.ozlabs.org Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | hv_netvsc: Implement partial copy into send bufferHaiyang Zhang2015-04-143-19/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If remaining space in a send buffer slot is too small for the whole message, we only copy the RNDIS header and PPI data into send buffer, so we can batch one more packet each time. It reduces the vmbus per-message overhead. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'for-davem' of ↵David S. Miller2015-04-1387-387/+494
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Al Viro says: ==================== netdev-related stuff in vfs.git There are several commits sitting in vfs.git that probably ought to go in via net-next.git. First of all, there's merge with vfs.git#iocb - that's Christoph's aio rework, which has triggered conflicts with the ->sendmsg() and ->recvmsg() patches a while ago. It's not so much Christoph's stuff that ought to be in net-next, as (pretty simple) conflict resolution on merge. The next chunk is switch to {compat_,}import_iovec/import_single_range - new safer primitives for initializing iov_iter. The primitives themselves come from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue), conversion of net/socket.c syscalls belongs in net-next, IMO. Next there's afs and rxrpc stuff from dhowells. And then there's sanitizing kernel_sendmsg et.al. + missing inlined helper for "how much data is left in msg->msg_iter" - this stuff is used in e.g. cifs stuff, but it belongs in net-next. That pile is pullable from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem I'll post the individual patches in there in followups; could you take a look and tell if everything in there is OK with you? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | new helper: msg_data_left()Al Viro2015-04-118-23/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | Merge remote-tracking branch 'dh/afs' into for-davemAl Viro2015-04-117-35/+166
| | |\ \ \
| | | * | | kafs: Add more "unified AFS" error codesNathaniel Wesley Filardo2015-04-011-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should cover the set emitted by viced and the volume server. Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Signed-off-by: David Howells <dhowells@redhat.com>
| | | * | | RxRPC: Handle VERSION Rx protocol packetsDavid Howells2015-04-014-2/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle VERSION Rx protocol packets. We should respond to a VERSION packet with a string indicating the Rx version. This is a maximum of 64 characters and is padded out to 65 chars with NUL bytes. Note that other AFS clients use the version request as a NAT keepalive so we need to handle it rather than returning an abort. The standard formulation seems to be: <project> <version> built <yyyy>-<mm>-<dd> for example: " OpenAFS 1.6.2 built 2013-05-07 " (note the three extra spaces) as obtained with: rxdebug grand.mit.edu -version from the openafs package. Signed-off-by: David Howells <dhowells@redhat.com>
| | | * | | AFS: afs_send_empty_reply() doesn't require an iovec arrayDavid Howells2015-04-011-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_send_empty_reply() doesn't require an iovec array with which to initialise the msghdr, but can pass NULL instead. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| | | * | | RxRPC: Use iov_iter_count() in rxrpc_send_data() instead of the len argumentDavid Howells2015-04-011-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use iov_iter_count() in rxrpc_send_data() to get the remaining data length instead of using the len argument as the len argument is now redundant. Signed-off-by: David Howells <dhowells@redhat.com>
| | | * | | RxRPC: Don't call skb_add_data() if there's no data to copyDavid Howells2015-04-011-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't call skb_add_data() in rxrpc_send_data() if there's no data to copy and also skip the calculations associated with it in such a case. Signed-off-by: David Howells <dhowells@redhat.com>
| | | * | | RxRPC: Fix the conversion to iov_iterDavid Howells2015-04-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit: commit af2b040e470b470bfc881981db3c796072853eae Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu Nov 27 21:44:24 2014 -0500 Subject: rxrpc: switch rxrpc_send_data() to iov_iter primitives incorrectly changes a do-while loop into a while loop in rxrpc_send_data(). Unfortunately, at least one pass through the loop is required - even if there is no data - so that the packet the closes the send phase can be sent if MSG_MORE is not set. Signed-off-by: David Howells <dhowells@redhat.com>
| | * | | | get rid of the size argument of sock_sendmsg()Al Viro2015-04-113-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | switch kernel_sendmsg() and kernel_recvmsg() to iov_iter_kvec()Al Viro2015-04-091-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For kernel_sendmsg() that eliminates the need to play with setfs(); for kernel_recvmsg() it does *not* - a couple of callers are using it with non-NULL ->msg_control, which would be treated as userland address on recvmsg side of things. In all cases we are really setting a kvec-backed iov_iter, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | net: switch importing msghdr from userland to {compat_,}import_iovec()Al Viro2015-04-093-31/+20
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | net: switch sendto() and recvfrom() to import_single_range()Al Viro2015-04-091-16/+8
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | Merge branch 'iov_iter' into for-davemAl Viro2015-04-092-0/+71
| | |\ \ \ \
| | * \ \ \ \ Merge branch 'iocb' into for-davemAl Viro2015-04-0969-266/+199
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | trivial conflict in net/socket.c and non-trivial one in crypto - that one had evaded aio_complete() removal. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | tcp/dccp: get rid of central timewait timerEric Dumazet2015-04-1311-386/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using a timer wheel for timewait sockets was nice ~15 years ago when memory was expensive and machines had a single processor. This does not scale, code is ugly and source of huge latencies (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.) We can afford to use an extra 64 bytes per timewait sock and spread timewait load to all cpus to have better behavior. Tested: On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1 on the target (lpaa24) Before patch : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 419594 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 437171 While test is running, we can observe 25 or even 33 ms latencies. lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2 lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2 After patch : About 90% increase of throughput : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 810442 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 800992 And latencies are kept to minimal values during this load, even if network utilization is 90% higher : lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | netfilter: Fix format string of nfnetlink_log proc fileRichard Weinberger2015-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | netfilter: Fix format string of nfnetlink_queue proc fileRichard Weinberger2015-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Fixes: $ cat /proc/net/netfilter/nfnetlink_queue 0 29508 278 2 65531 0 2004213241 -2129885586 1 1 -27747 0 2 65531 0 0 0 1 2 -27748 0 2 65531 0 0 0 1 Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud