summaryrefslogtreecommitdiffstats
path: root/net/sched
Commit message (Collapse)AuthorAgeFilesLines
...
* | sch_gred: prefer GFP_KERNEL allocationsEric Dumazet2011-12-161-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In control path, its better to use GFP_KERNEL allocations where possible. Before taking qdisc spinlock, we preallocate memory just in case we'll need it in gred_change_vq() This is a followup to commit 3f1e6d3fd37b (sch_gred: should not use GFP_KERNEL while holding a spinlock) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-161-1/+1
|\ \ | |/ | | | | | | | | | | Conflicts: drivers/net/ethernet/freescale/fsl_pq_mdio.c net/batman-adv/translation-table.c net/ipv6/route.c
| * sch_gred: should not use GFP_KERNEL while holding a spinlockEric Dumazet2011-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | gred_change_vq() is called under sch_tree_lock(sch). This means a spinlock is held, and we are not allowed to sleep in this context. We might pre-allocate memory using GFP_KERNEL before taking spinlock, but this is not suitable for stable material. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cls_flow: remove one dynamic arrayEric Dumazet2011-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | Its better to use a predefined size for this small automatic variable. Removes a sparse error as well : net/sched/cls_flow.c:288:13: error: bad constant expression Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: add cell concept to simulate special MAC behaviorHagen Paul Pfeifer2011-12-121-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extension can be used to simulate special link layer characteristics. Simulate because packet data is not modified, only the calculation base is changed to delay a packet based on the original packet size and artificial cell information. packet_overhead can be used to simulate a link layer header compression scheme (e.g. set packet_overhead to -20) or with a positive packet_overhead value an additional MAC header can be simulated. It is also possible to "replace" the 14 byte Ethernet header with something else. cell_size and cell_overhead can be used to simulate link layer schemes, based on cells, like some TDMA schemes. Another application area are MAC schemes using a link layer fragmentation with a (small) header each. Cell size is the maximum amount of data bytes within one cell. Cell overhead is an additional variable to change the per-cell-overhead (e.g. 5 byte header per fragment). Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per cell overhead 5 byte): tc qdisc add dev eth0 root netem rate 5kbit 20 100 5 Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_red: generalize accurate MAX_P support to RED/GRED/CHOKEEric Dumazet2011-12-093-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now RED uses a Q0.32 number to store max_p (max probability), allow RED/GRED/CHOKE to use/report full resolution at config/dump time. Old tc binaries are non aware of new attributes, and still set/get Plog. New tc binary set/get both Plog and max_p for backward compatibility, they display "probability value" if they get max_p from new kernels. # tc -d qdisc show dev ... ... qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5 probability 0.09 Scell_log 15 Make sure we avoid potential divides by 0 in reciprocal_value(), if (max_th - min_th) is big. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_red: Adaptative RED AQMEric Dumazet2011-12-081-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adaptative RED AQM for linux, based on paper from Sally FLoyd, Ramakrishna Gummadi, and Scott Shenker, August 2001 : http://icir.org/floyd/papers/adaptiveRed.pdf Goal of Adaptative RED is to make max_p a dynamic value between 1% and 50% to reach the target average queue : (max_th - min_th) / 2 Every 500 ms: if (avg > target and max_p <= 0.5) increase max_p : max_p += alpha; else if (avg < target and max_p >= 0.01) decrease max_p : max_p *= beta; target :[min_th + 0.4*(min_th - max_th), min_th + 0.6*(min_th - max_th)]. alpha : min(0.01, max_p / 4) beta : 0.9 max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa) Changes against our RED implementation are : max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32 fixed point number, to allow full range described in Adatative paper. To deliver a random number, we now use a reciprocal divide (thats really a multiply), but this operation is done once per marked/droped packet when in RED_BETWEEN_TRESH window, so added cost (compared to previous AND operation) is near zero. dump operation gives current max_p value in a new TCA_RED_MAX_P attribute. Example on a 10Mbit link : tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \ limit 400000 min 30000 max 90000 avpkt 1000 \ burst 55 ecn adaptative bandwidth 10Mbit # tc -s -d qdisc show dev eth3 ... qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn adaptative ewma 5 max_p=0.113335 Scell_log 15 Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0) rate 9749Kbit 831pps backlog 72056b 16p requeues 0 marked 1357 early 35 pdrop 0 other 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller2011-12-051-1/+1
| | | | | | | | | | | | | | | | To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-022-13/+22
|\ \ | |/
| * sch_red: fix red_changeEric Dumazet2011-12-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit : > (Almost) nobody uses RED because they can't figure it out. > According to Wikipedia, VJ says that: > "there are not one, but two bugs in classic RED." RED is useful for high throughput routers, I doubt many linux machines act as such devices. I was considering adding Adaptative RED (Sally Floyd, Ramakrishna Gummadi, Scott Shender), August 2001 In this version, maxp is dynamic (from 1% to 50%), and user only have to setup min_th (target average queue size) (max_th and wq (burst in linux RED) are automatically setup) By the way it seems we have a small bug in red_change() if (skb_queue_empty(&sch->q)) red_end_of_idle_period(&q->parms); First, if queue is empty, we should call red_start_of_idle_period(&q->parms); Second, since we dont use anymore sch->q, but q->qdisc, the test is meaningless. Oh well... [PATCH] sch_red: fix red_change() Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty, we start an idle period, not end it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sch_teql: fix lockdep splatEric Dumazet2011-11-301-11/+20
| | | | | | | | | | | | | | | | | | | | | | We need rcu_read_lock() protection before using dst_get_neighbour(), and we must cache its value (pass it to __teql_resolve()) teql_master_xmit() is called under rcu_read_lock_bh() protection, its not enough. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: fix build error on 32bit archesEric Dumazet2011-12-011-1/+4
| | | | | | | | | | | | | | | | ERROR: "__udivdi3" [net/sched/sch_netem.ko] undefined! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: rate extensionHagen Paul Pfeifer2011-11-301-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently netem is not in the ability to emulate channel bandwidth. Only static delay (and optional random jitter) can be configured. To emulate the channel rate the token bucket filter (sch_tbf) can be used. But TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot be 0. Also the idea behind TBF is that the credit (token in buckets) fills if no packet is transmitted. So that there is always a "positive" credit for new packets. In real life this behavior contradicts the law of nature where nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0 seconds. Netem is an excellent place to implement a rate limiting feature: static delay is already implemented, tfifo already has time information and the user can skip TBF configuration completely. This patch implement rate feature which can be configured via tc. e.g: tc qdisc add dev eth0 root netem rate 10kbit To emulate a link of 5000byte/s and add an additional static delay of 10ms: tc qdisc add dev eth0 root netem delay 10ms rate 5KBps Note: similar to TBF the rate extension is bounded to the kernel timing system. Depending on the architecture timer granularity, higher rates (e.g. 10mbit/s and higher) tend to transmission bursts. Also note: further queues living in network adaptors; see ethtool(8). Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@drr.davemloft.net>
* | sch_choke: use skb_flow_dissect()Eric Dumazet2011-11-291-89/+31
| | | | | | | | | | | | | | | | Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_sfq: use skb_flow_dissect()Eric Dumazet2011-11-291-55/+10
| | | | | | | | | | | | | | | | Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add queue state xoff flag for stackTom Herbert2011-11-293-9/+11
| | | | | | | | | | | | | | | | | | | | Create separate queue state flags so that either the stack or drivers can turn on XOFF. Added a set of functions used in the stack to determine if a queue is really stopped (either by stack or driver) Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_sfb: use skb_flow_dissect()Eric Dumazet2011-11-281-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Current SFB double hashing is not fulfilling SFB theory, if two flows share same rxhash value. Using skb_flow_dissect() permits to really have better hash dispersion, and get tunnelling support as well. Double hashing point was mentioned by Florian Westphal Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cls_flow: use skb_flow_dissect()Eric Dumazet2011-11-281-132/+48
| | | | | | | | | | | | | | | | | | | | Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. This lack of tunnelling support was mentioned by Dan Siemon. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: new counter for tx_timeout errors in sysfsdavid decotigny2011-11-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the /sys/class/net/DEV/queues/Q/tx_timeout attribute containing the total number of timeout events on the given queue. It is always available with CONFIG_SYSFS, independently of CONFIG_RPS/XPS. Credits to Stephen Hemminger for a preliminary version of this patch. Tested: without CONFIG_SYSFS (compilation only) with sysfs and without CONFIG_RPS & CONFIG_XPS with sysfs and without CONFIG_RPS with sysfs and without CONFIG_XPS with defaults Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_choke: use skb_header_pointer()Eric Dumazet2011-11-081-10/+17
|/ | | | | | | | | | Remove the assumption that skb_get_rxhash() makes IP header and ports linear, and use skb_header_pointer() instead in choke_match_flow() This permits __skb_get_rxhash() to use skb_header_pointer() eventually. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modulesPaul Gortmaker2011-10-311-0/+1
| | | | | | | | | These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* net: Fix files explicitly needing to include module.hPaul Gortmaker2011-10-313-0/+3
| | | | | | | | | With calls to modular infrastructure, these files really needs the full module.h header. Call it out so some of the cleanups of implicit and unrequired includes elsewhere can be cleaned up. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* net_sched: cls_flow: use skb_header_pointer()Eric Dumazet2011-10-241-92/+96
| | | | | | | | | | | Dan Siemon would like to add tunnelling support to cls_flow This preliminary patch introduces use of skb_header_pointer() to help this task, while avoiding skb head reallocation because of deep packet inspection. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of github.com:davem330/netDavid S. Miller2011-09-221-14/+13
|\ | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
| * pkt_sched: cls_rsvp.h was outdatedIgor Maravić2011-09-151-14/+13
| | | | | | | | | | | | | | | | | | | | File cls_rsvp.h in /net/sched was outdated. I'm sending you patch for this file. [ tb[] array should be indexed by X not X-1 -DaveM ] Signed-off-by: Igor Maravić <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sfb: optimize enqueue on full queueEric Dumazet2011-08-261-5/+8
|/ | | | | | | | | | In case SFB queue is full (hard limit reached), there is no point spending time to compute hash and maximum qlen/p_mark. We instead just early drop packet. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: fix port mirror/redirect stats reportingJamal Hadi Salim2011-08-171-2/+1
| | | | | | | | When a redirected or mirrored packet is dropped by the target device we need to record statistics. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: prio: use qdisc_dequeue_peekedFlorian Westphal2011-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | commit 07bd8df5df4369487812bf85a237322ff3569b77 (sch_sfq: fix peek() implementation) changed sfq to use generic peek helper. This makes HFSC complain about a non-work-conserving child qdisc, if prio with sfq child is used within hfsc: hfsc peeks into prio qdisc, which will then peek into sfq. returned skb is stashed in sch->gso_skb. Next, hfsc tries to dequeue from prio, but prio will call sfq dequeue directly, which may return NULL instead of previously peeked-at skb. Have prio call qdisc_dequeue_peeked, so sfq->dequeue() is not called in this case. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: fix sfq_enqueue()Eric Dumazet2011-08-011-1/+6
| | | | | | | | | | | | | | | | | | commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals) forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a packet (from another flow) was dropped, leading to various problems. With help from Michal Soltys and Michal Pokrywka, who did a bisection. Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372 Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945 Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com> Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michal Soltys <soltys@ziu.info> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Abstract dst->neighbour accesses behind helpers.David S. Miller2011-07-171-2/+2
| | | | | | dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
* Remove redundant variable/code in __qdisc_runKrishna Kumar2011-07-151-3/+1
| | | | | | | Remove redundant variable "work". Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove SK_ROUTE_CAPS from meta ematchMichał Mirosław2011-07-141-7/+0
| | | | | | | | | Remove it, as it indirectly exposes netdev features. It's not used in iproute2 (2.6.38) - is anything else using its interface? Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: constify tcf_proto and tc_actionEric Dumazet2011-07-0620-25/+27
| | | | | Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: fix dequeuer fairnessjamal2011-06-271-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Results on dummy device can be seen in my netconf 2011 slides. These results are for a 10Gige IXGBE intel nic - on another i5 machine, very similar specs to the one used in the netconf2011 results. It turns out - this is a hell lot worse than dummy and so this patch is even more beneficial for 10G. Test setup: ---------- System under test sending packets out. Additional box connected directly dropping packets. Installed prio qdisc on the eth device and default netdev default length of 1000 used as is. The 3 prio bands each were set to 100 (didnt factor in the results). 5 packet runs were made and the middle 3 picked. results ------- The "cpu" column indicates the which cpu the sample was taken on, The "Pkt runx" carries the number of packets a cpu dequeued when forced to be in the "dequeuer" role. The "avg" for each run is the number of times each cpu should be a "dequeuer" if the system was fair. 3.0-rc4 (plain) cpu Pkt run1 Pkt run2 Pkt run3 ================================================ cpu0 21853354 21598183 22199900 cpu1 431058 473476 393159 cpu2 481975 477529 458466 cpu3 23261406 23412299 22894315 avg 11506948 11490372 11486460 3.0-rc4 with patch and default weight 64 cpu Pkt run1 Pkt run2 Pkt run3 ================================================ cpu0 13205312 13109359 13132333 cpu1 10189914 10159127 10122270 cpu2 10213871 10124367 10168722 cpu3 13165760 13164767 13096705 avg 11693714 11639405 11630008 As you can see the system is still not perfect but is a lot better than what it was before... At the moment we use the old backlog weight, weight_p which is 64 packets. It seems to be reasonably fine with that value. The system could be made more fair if we reduce the weight_p (as per my presentation), but we are going to affect the shared backlog weight. Unless deemed necessary, I think the default value is fine. If not we could add yet another knob. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip: introduce ip_is_fragment helper inline functionPaul Gortmaker2011-06-214-5/+5
| | | | | | | | | | | There are enough instances of this: iph->frag_off & htons(IP_MF | IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove mm.h inclusion from netdevice.hAlexey Dobriyan2011-06-211-0/+1
| | | | | | | | | | | | | | | | | Remove linux/mm.h inclusion from netdevice.h -- it's unused (I've checked manually). To prevent mm.h inclusion via other channels also extract "enum dma_data_direction" definition into separate header. This tiny piece is what gluing netdevice.h with mm.h via "netdevice.h => dmaengine.h => dma-mapping.h => scatterlist.h => mm.h". Removal of mm.h from scatterlist.h was tried and was found not feasible on most archs, so the link was cutoff earlier. Hope people are OK with tiny include file. Note, that mm_types.h is still dragged in, but it is a separate story. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-06-201-2/+1
|\ | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c
| * net: Rework netdev_drivername() to avoid warning.David S. Miller2011-06-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This interface uses a temporary buffer, but for no real reason. And now can generate warnings like: net/sched/sch_generic.c: In function dev_watchdog net/sched/sch_generic.c:254:10: warning: unused variable drivername Just return driver->name directly or "". Reported-by: Connor Hansen <cmdkhh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2011-06-093-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | net: remove interrupt.h inclusion from netdevice.hAlexey Dobriyan2011-06-061-0/+1
|/ | | | | | | | * remove interrupt.g inclusion from netdevice.h -- not needed * fixup fallout, add interrupt.h and hardirq.h back where needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: fix peek() implementationEric Dumazet2011-05-251-13/+1
| | | | | | | | | | | | | | | Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair), sfq_peek() can return a different skb that would be normally dequeued by sfq_dequeue() [ if current slot->allot is negative ] Use generic qdisc_peek_dequeued() instead of custom implementation, to get consistent result. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: avoid giving spurious NET_XMIT_CN signalsEric Dumazet2011-05-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | While chasing a possible net_sched bug, I found that IP fragments have litle chance to pass a congestioned SFQ qdisc : - Say SFQ qdisc is full because one flow is non responsive. - ip_fragment() wants to send two fragments belonging to an idle flow. - sfq_enqueue() queues first packet, but see queue limit reached : - sfq_enqueue() drops one packet from 'big consumer', and returns NET_XMIT_CN. - ip_fragment() cancel remaining fragments. This patch restores fairness, making sure we return NET_XMIT_CN only if we dropped a packet from the same flow. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid synchronize_rcu() in dev_deactivate_manyEric Dumazet2011-05-221-2/+15
| | | | | | | | | | | | | | | | | | | | | dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set to noop_qdisc. This call is here to make sure they are no outstanding qdisc-less dev_queue_xmit calls before returning to caller. But in dismantle phase, we dont have to wait, because we wont activate again the device, and we are going to wait one rcu grace period later in rollback_registered_many(). After this patch, device dismantle uses one synchronize_net() and one rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL latency. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net>, CC: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2011-05-205-3/+1151
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.
| * networking: NET_CLS_ROUTE4 depends on INETRandy Dunlap2011-05-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | IP_ROUTE_CLASSID depends on INET and NET_CLS_ROUTE4 selects IP_ROUTE_CLASSID, but when INET is not enabled, this kconfig warning is produced, so fix it by making NET_CLS_ROUTE4 depend on INET. warning: (NET_CLS_ROUTE4) selects IP_ROUTE_CLASSID which has unmet direct dependencies (NET && INET) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * pkt_sched: Kill set but unused variable 'protocol' in tc_classify()David S. Miller2011-05-191-2/+0
| | | | | | | | | | | | | | I checked the history and this has been like this since the beginning of time. Signed-off-by: David S. Miller <davem@davemloft.net>
| * inet: constify ip headers and in6_addrEric Dumazet2011-04-221-1/+1
| | | | | | | | | | | | | | | | Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2011-04-115-7/+7
| |\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smsc911x.c
| * | pkt_sched: QFQ - quick fair queue schedulerstephen hemminger2011-04-043-0/+1149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an implementation of the Quick Fair Queue scheduler developed by Fabio Checconi. The same algorithm is already implemented in ipfw in FreeBSD. Fabio had an earlier version developed on Linux, I just cleaned it up. Thanks to Eric Dumazet for testing this under load. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net,act_police,rcu: remove rcu_barrier()Lai Jiangshan2011-05-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no callback of this module maybe queued since we use kfree_rcu(), we can safely remove the rcu_barrier(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
OpenPOWER on IntegriCloud