summaryrefslogtreecommitdiffstats
path: root/drivers/net/virtio_net.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * | net: Explicitly initialize u64_stats_sync structures for lockdepJohn Stultz2013-11-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | virtio-net: switch to use XPS to choose txqJason Wang2013-11-051-46/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to use a percpu structure vq_index to record the cpu to queue mapping, this is suboptimal since it duplicates the work of XPS and loses all other XPS functionality such as allowing user to configure their own transmission steering strategy. So this patch switches to use XPS and suggest a default mapping when the number of cpus is equal to the number of queues. With XPS support, there's no need for keeping per-cpu vq_index and .ndo_select_queue(), so they were removed also. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | virtio-net: coalesce rx frags when possible during rxJason Wang2013-11-041-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx buffers to page frag allocators) try to increase the payload/truesize for MTU-sized traffic. But this will introduce the extra overhead for GSO packets received because of the frag list. This commit tries to reduce this issue by coalesce the possible rx frags when possible during rx. Test result shows the about 15% improvement on full size GSO packet receiving (and even better than before commit 2613af0ed18a11d5c566a81f9a6510b73180660a). Before this commit: ./netperf -H 192.168.100.4 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 20303.87 After this commit: ./netperf -H 192.168.100.4 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 23841.26 Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Michael Dalton <mwdalton@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-11-041-7/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | virtio-net: correctly handle cpu hotplug notifier during resumingJason Wang2013-10-291-7/+6
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3ab098df35f8b98b6553edc2e40234af512ba877 (virtio-net: don't respond to cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug notifier by checking the config_enable and does nothing is it was false. So it need to try to hold the config_lock mutex which may happen in atomic environment which leads the following warnings: [ 622.944441] CPU0 attaching NULL sched-domain. [ 622.944446] CPU1 attaching NULL sched-domain. [ 622.944485] CPU0 attaching NULL sched-domain. [ 622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616 [ 622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1 [ 622.950796] no locks held by migration/1/10. [ 622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317 [ 622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 622.950802] 0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70 [ 622.950803] ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed [ 622.950805] 0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000 [ 622.950805] Call Trace: [ 622.950810] [<ffffffff81a32f22>] dump_stack+0x54/0x74 [ 622.950815] [<ffffffff810edb02>] __might_sleep+0x112/0x114 [ 622.950817] [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6 [ 622.950818] [<ffffffff810e861d>] ? up+0x39/0x3e [ 622.950821] [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d [ 622.950824] [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62 [ 622.950828] [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87 [ 622.950830] [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e [ 622.950832] [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10 [ 622.950835] [<ffffffff810c5556>] __cpu_notify+0x20/0x37 [ 622.950836] [<ffffffff810c5580>] cpu_notify+0x13/0x15 [ 622.950838] [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a [ 622.950841] [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1 [ 622.950842] [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f [ 622.950844] [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f [ 622.950847] [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8 [ 622.950848] [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47 [ 622.950850] [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3 [ 622.950852] [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67 [ 622.950854] [<ffffffff810e36b7>] kthread+0xd8/0xe0 [ 622.950857] [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164 [ 622.950859] [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124 [ 622.950861] [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0 [ 622.950862] [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124 [ 622.950876] smpboot: CPU 1 is now offline [ 623.194556] SMP alternatives: lockdep: fixing up alternatives [ 623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1 ... A correct fix is to unregister the hotcpu notifier during restore and register a new one in resume. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | virtio_net: migrate mergeable rx buffers to page frag allocatorsMichael Dalton2013-10-281-58/+106
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | The virtio_net driver's mergeable receive buffer allocator uses 4KB packet buffers. For MTU-sized traffic, SKB truesize is > 4KB but only ~1500 bytes of the buffer is used to store packet data, reducing the effective TCP window size substantially. This patch addresses the performance concerns with mergeable receive buffers by allocating MTU-sized packet buffers using page frag allocators. If more than MAX_SKB_FRAGS buffers are needed, the SKB frag_list is used. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio-net: refill only when device is up during setting queuesJason Wang2013-10-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to schedule the refill work unconditionally after changing the number of queues. This may lead an issue if the device is not up. Since we only try to cancel the work in ndo_stop(), this may cause the refill work still work after removing the device. Fix this by only schedule the work when device is up. The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2. (virtio-net: fix the race between channels setting and refill) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio-net: don't respond to cpu hotplug notifier if we're not readyJason Wang2013-10-171-0/+8
|/ | | | | | | | | | | | | | | | | | | | | We're trying to re-configure the affinity unconditionally in cpu hotplug callback. This may lead the issue during resuming from s3/s4 since - virt queues haven't been allocated at that time. - it's unnecessary since thaw method will re-configure the affinity. Fix this issue by checking the config_enable and do nothing is we're not ready. The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17 (virtio-net: reset virtqueue affinity when doing cpu hotplug). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* virtio-net: Set RXCSUM feature if GUEST_CSUM is availableThomas Huth2013-09-031-0/+2
| | | | | | | | | | | | | | If the VIRTIO_NET_F_GUEST_CSUM virtio feature is available, the guest does not have to calculate the checksums on all received packets. This is pretty much the same feature as RX checksum offloading on real network cards, so the virtio-net driver should report this by setting the NETIF_F_RXCSUM flag. When the user now runs "ethtool -k", he or she can see whether the virtio-net interface has to calculate RX checksums or not. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* virtio-net: put virtio net header inline with dataMichael S. Tsirkin2013-07-271-8/+34
| | | | | | | | | | | | | | For small packets we can simplify xmit processing by linearizing buffers with the header: most packets seem to have enough head room we can use for this purpose. Since existing hypervisors require that header is the first s/g element, we need a feature bit for this. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds2013-07-101-4/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "No real surprises" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MAINTAINERS: add tools/virtio/ under virtio tools/virtio: move module license stub to module.h virtio: include asm/barrier explicitly virtio: VIRTIO_F_ANY_LAYOUT feature lguest: fix example launcher compilation for broken glibc headers. virtio-net: fix the race between channels setting and refill tools/lguest: real barriers. tools/lguest: fix missing rmb(). virtio_balloon: leak_balloon(): only tell host if we got pages deflated virtio-pci: fix leaks of msix_affinity_masks Fix comment typo "CONFIG_PAE"
| * virtio-net: fix the race between channels setting and refillJason Wang2013-07-041-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx queues which are being used) tries to refill on demand when changing the number of channels by call try_refill_recv() directly, this may race: - the refill work who may do the refill in the same time - the try_refill_recv() called in bh since napi was not disabled Which may led guest complain during setting channels: virtio_net virtio0: input.1:id 0 is not a head! Solve this issue by scheduling a refill work which can guarantee the serialization of refill. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | virtio_net: fix race in RX VQ processingMichael S. Tsirkin2013-07-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio net called virtqueue_enable_cq on RX path after napi_complete, so with NAPI_STATE_SCHED clear - outside the implicit napi lock. This violates the requirement to synchronize virtqueue_enable_cq wrt virtqueue_add_buf. In particular, used event can move backwards, causing us to lose interrupts. In a debug build, this can trigger panic within START_USE. Jason Wang reports that he can trigger the races artificially, by adding udelay() in virtqueue_enable_cb() after virtio_mb(). However, we must call napi_complete to clear NAPI_STATE_SCHED before polling the virtqueue for used buffers, otherwise napi_schedule_prep in a callback will fail, causing us to lose RX events. To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED set (under napi lock), later call virtqueue_poll with NAPI_STATE_SCHED clear (outside the lock). Reported-by: Jason Wang <jasowang@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio_net: enable napi for all possible queues during openJason Wang2013-05-231-4/+5
|/ | | | | | | | | | | | | | | Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx queues which are being used) only does the napi enabling during open for curr_queue_pairs. This will break multiqueue receiving since napi of new queues were still disabled after changing the number of queues. This patch fixes this by enabling napi for all possible queues during open. Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* virtio_net: use default napi weight by defaultAmerigo Wang2013-05-111-1/+1
| | | | | | | | | | | | | Since commit 82dc3c63c692b1e1d5937 ("net: introduce NAPI_POLL_WEIGHT") we warn drivers when they use napi weight higher than NAPI_POLL_WEIGHT, but virtio_net still uses 128 by default. This patch makes its default value to NAPI_POLL_WEIGHT. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds2013-05-021-36/+41
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio & lguest updates from Rusty Russell: "Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now..." Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename. Hopefully correctly resolved. * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits) caif_virtio: Remove bouncing email addresses lguest: improve code readability in lg_cpu_start. virtio-net: fill only rx queues which are being used lguest: map Switcher below fixmap. lguest: cache last cpu we ran on. lguest: map Switcher text whenever we allocate a new pagetable. lguest: don't share Switcher PTE pages between guests. lguest: expost switcher_pages array (as lg_switcher_pages). lguest: extract shadow PTE walking / allocating. lguest: make check_gpte et. al return bool. lguest: assume Switcher text is a single page. lguest: rename switcher_page to switcher_pages. lguest: remove RESERVE_MEM constant. lguest: check vaddr not pgd for Switcher protection. lguest: prepare to make SWITCHER_ADDR a variable. virtio: console: replace EMFILE with EBUSY for already-open port virtio-scsi: reset virtqueue affinity when doing cpu hotplug virtio-scsi: introduce multiqueue support virtio-scsi: push vq lock/unlock into virtscsi_vq_done virtio-scsi: pass struct virtio_scsi to virtqueue completion function ...
| * virtio-net: fill only rx queues which are being usedSasha Levin2013-04-291-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: sh-4.2# free -h total used free shared buffers cached Mem: 490M 35M 455M 0B 0B 4.1M -/+ buffers/cache: 31M 459M Swap: 0B 0B 0B sh-4.2# ethtool -L eth0 combined 8 sh-4.2# free -h total used free shared buffers cached Mem: 490M 162M 327M 0B 0B 4.1M -/+ buffers/cache: 158M 331M Swap: 0B 0B 0B Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * virtio_net: use simplified virtqueue accessors.Rusty Russell2013-03-201-6/+5
| | | | | | | | | | | | | | | | We never add buffers with input and output parts, so use the new accessors. Cc: "Michael S. Tsirkin" <mst@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * virtio_net: use virtqueue_add_sgs[] for command buffers.Rusty Russell2013-03-201-25/+26
| | | | | | | | | | | | | | | | It's a bit cleaner to hand multiple sgs, rather than one big one. Cc: "Michael S. Tsirkin" <mst@redhat.com> Tested-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | net: vlan: prepare for 802.1ad VLAN filtering offloadPatrick McHardy2013-04-191-2/+4
| | | | | | | | | | | | | | | | | | Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*Patrick McHardy2013-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio-net: initialize vlan_featuresJason Wang2013-04-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's nothing that prevent passing the device features of virtio_net to its vlan device. So this patch simply passes those to vlan device to benefit from advanced features. Netperf shows better sending performance for vlan device since TSO can work on vlan now. before: netperf -H 192.168.5.2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 4162.35 after: netperf -H 192.168.5.2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 9365.42 Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio: remove obsolete virtqueue_get_queue_index()Rusty Russell2013-03-221-2/+2
|/ | | | | | | | | | | You can access it directly now, since 3.8: v3.7-rc1-13-g06ca287 'virtio: move queue_index and num_free fields into core struct virtqueue.' Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds2013-02-261-11/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "All trivial, thanks to the stuff which didn't quite make it time" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_console: Initialize guest_connected=true for rproc_serial virtio: use module_virtio_driver. virtio: Add module driver macro for virtio drivers. virtio_console: Use virtio device index to generate port name virtio: make pci_device_id const virtio: make config_ops const virtio-mmio: fix wrong comment about register offset virtio_console: Let unconnected rproc device receive data.
| * virtio: use module_virtio_driver.Rusty Russell2013-02-131-11/+1
| | | | | | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | net: Fix possible wrong checksum generation.Pravin B Shelar2013-02-131-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | drivers:net:misc: Remove unnecessary alloc/OOM messagesJoe Perches2013-02-041-3/+1
| | | | | | | | | | | | | | | | alloc failures already get standardized OOM messages and a dump_stack. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-01-291-20/+98
|\ \ | | | | | | | | | | | | | | | | | | Bring in the 'net' tree so that we can get some ipv4/ipv6 bug fixes that some net-next work will build upon. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | virtio-net: reset virtqueue affinity when doing cpu hotplugWanlong Gao2013-01-271-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a cpu notifier to virtio-net, so that we can reset the virtqueue affinity if the cpu hotplug happens. It improve the performance through enabling or disabling the virtqueue affinity after doing cpu hotplug. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <erdnetdev@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | virtio-net: split out clean affinity functionWanlong Gao2013-01-271-36/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out the clean affinity function to virtnet_clean_affinity(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <erdnetdev@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | virtio-net: fix the set affinity bug when CPU IDs are not consecutiveWanlong Gao2013-01-271-13/+54
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Michael mentioned, set affinity and select queue will not work very well when CPU IDs are not consecutive, this can happen with hot unplug. Fix this bug by traversal the online CPUs, and create a per cpu variable to find the mapping from CPU to the preferable virtual-queue. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <erdnetdev@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: fix possible wrong checksum generationEric Dumazet2013-01-281-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio-net: introduce a new control to set macaddrAmos Kong2013-01-211-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | Currently we write MAC address to pci config space byte by byte, this means that we have an intermediate step where mac is wrong. This patch introduced a new control command to set MAC address, it's atomic. VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | move virtnet_send_command() above virtnet_set_mac_address()Amos Kong2013-01-211-45/+44
|/ | | | | | | | | We want to send vq command to set mac address in virtnet_set_mac_address(), so do this function moving. Fixed a little issue of coding style. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds2012-12-201-29/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio update from Rusty Russell: "Some nice cleanups, and even a patch my wife did as a "live" demo for Latinoware 2012. There's a slightly non-trivial merge in virtio-net, as we cleaned up the virtio add_buf interface while DaveM accepted the mq virtio-net patches." * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits) virtio_console: Add support for remoteproc serial virtio_console: Merge struct buffer_token into struct port_buffer virtio: add drv_to_virtio to make code clearly virtio: use dev_to_virtio wrapper in virtio virtio-mmio: Fix irq parsing in command line parameter virtio_console: Free buffers from out-queue upon close virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>( virtio_console: Use kmalloc instead of kzalloc virtio_console: Free buffer if splice fails virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: make virtqueue_add_buf() returning 0 on success, not capacity. virtio: console: don't rely on virtqueue_add_buf() returning capacity. virtio_net: don't rely on virtqueue_add_buf() returning capacity. virtio-net: remove unused skb_vnet_hdr->num_sg field virtio-net: correct capacity math on ring full virtio: move queue_index and num_free fields into core struct virtqueue. ...
| * virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0Rusty Russell2012-12-181-1/+1
| | | | | | | | | | | | We simplified virtqueue_add_buf(), make it clear in the callers. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * virtio_net: don't rely on virtqueue_add_buf() returning capacity.Rusty Russell2012-12-181-20/+13
| | | | | | | | | | | | | | | | | | | | Now we can easily use vq->num_free to determine if there are descriptors left in the queue, we're about to change virtqueue_add_buf() to return 0 on success. The virtio_net driver is the only one which actually uses the return value, so change that. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-net: remove unused skb_vnet_hdr->num_sg fieldMichael S. Tsirkin2012-12-181-3/+3
| | | | | | | | | | | | | | | | [Split from "correct capacity math on ring full" -- Rusty] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-net: correct capacity math on ring fullMichael S. Tsirkin2012-12-181-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Capacity math on ring full is wrong: we are looking at num_sg but that might be optimistic because of indirect buffer use. The implementation also penalizes fast path with extra memory accesses for the benefit of ring full condition handling which is slow path. It's easy to query ring capacity so let's do just that. This change also makes it easier to move vnet header for tx around as follow-up patch does. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
* | virtio_net: fix a typo in virtnet_alloc_queues()Amerigo Wang2012-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | Obviously it should check !vi->rq. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio-net: support changing the number of queue pairs through ethtoolJason Wang2012-12-091-0/+43
| | | | | | | | | | | | | | | | This patch implements the ethtool_{set|get}_channels method of virtio-net to allow user to change the number of queues when the device is running on demand. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio_net: multiqueue supportJason Wang2012-12-091-98/+375
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the multiqueue (VIRTIO_NET_F_MQ) support to virtio_net driver. VIRTIO_NET_F_MQ capable device could allow the driver to do packet transmission and reception through multiple queue pairs and does the packet steering to get better performance. By default, one one queue pair is used, user could change the number of queue pairs by ethtool in the next patch. When multiple queue pairs is used and the number of queue pairs is equal to the number of vcpus. Driver does the following optimizations to implement per-cpu virt queue pairs: - select the txq based on the smp processor id. - smp affinity hint to the cpu that owns the queue pairs. This could be used with the flow steering support of the device to guarantee the packets of a single flow is handled by the same cpu. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio-net: separate fields of sending/receiving queue from virtnet_infoJason Wang2012-12-091-124/+158
| | | | | | | | | | | | | | | | | | | | | | To support multiqueue transmitq/receiveq, the first step is to separate queue related structure from virtnet_info. This patch introduce send_queue and receive_queue structure and use the pointer to them as the parameter in functions handling sending/receiving. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | virtio_net: remove __dev* attributesBill Pemberton2012-12-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_HOTPLUG is going away as an option. As result the __dev* markings will be going away. Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | virtio_net: use net_*_ratelimited() helpersAmerigo Wang2012-11-091-8/+4
|/ | | | | | | | | | These can be converted to net_*_ratelimited(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2012-10-021-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
| * net: move and rename netif_notify_peers()Amerigo Wang2012-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | I believe net/core/dev.c is a better place for netif_notify_peers(), because other net event notify functions also stay in this file. And rename it to netdev_notify_peers(). Cc: David S. Miller <davem@davemloft.net> Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | workqueue: deprecate system_nrt[_freezable]_wqTejun Heo2012-08-201-6/+6
|/ | | | | | | | | | | | | | | | | | | | | | system_nrt[_freezable]_wq are now spurious. Mark them deprecated and convert all users to system[_freezable]_wq. If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant, so there's no reason to use system_nrt[_freezable]_wq. Please use system[_freezable]_wq instead. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com>
* net: fix race condition in several drivers when reading statsKevin Groeneveld2012-07-221-4/+4
| | | | | | | | | | | Fix race condition in several network drivers when reading stats on 32bit UP architectures. These drivers update their stats in a BH context and therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the stats. Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* virtio_net: use IFF_LIVE_ADDR_CHANGE priv_flagJiri Pirko2012-06-301-6/+5
| | | | | | Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud