summaryrefslogtreecommitdiffstats
path: root/net/sched
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | net: sched: red: inform offloads about harddrop settingJakub Kicinski2018-11-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To mirror software behaviour on offload more precisely inform the drivers about the state of the harddrop flag. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net/core: use __vlan_hwaccel helpersMichał Mirosław2018-11-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes assumptions about VLAN_TAG_PRESENT bit. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: prio: delay destroying child qdiscs on changeJakub Kicinski2018-11-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move destroying of the old child qdiscs outside of the sch_tree_lock() section. This should improve the software qdisc replace but is even more important for offloads. Calling offloads under a spin lock is best avoided, and child's destroy would be called under sch_tree_lock(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: red: delay destroying child qdisc on replaceJakub Kicinski2018-11-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move destroying of the old child qdisc outside of the sch_tree_lock() section. This should improve the software qdisc replace but is even more important for offloads. Firstly calling offloads under a spin lock is best avoided. Secondly the destroy event of existing child would have been sent to the offload device before the replace, causing confusion. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: refactor grafting Qdiscs with a parentJakub Kicinski2018-11-081-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code for grafting Qdiscs when there is a parent has two needless indentation levels, and breaks the "keep the success path unindented" guideline. Refactor. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: add an offload graft helperJakub Kicinski2018-11-082-24/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qdisc graft operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: set TCQ_F_OFFLOADED flag for MQJakub Kicinski2018-11-081-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PRIO and RED mark the qdisc with TCQ_F_OFFLOADED upon successful offload, make MQ do the same. The consistency will help with consistent graft callback behaviour. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: red: remove unnecessary red_dump_offload_stats parameterJakub Kicinski2018-11-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Offload dump helper does not use opt parameter, remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: sched: add an offload dump helperJakub Kicinski2018-11-083-31/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qdisc dump operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2018-12-262-5/+5
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest RCU changes in this cycle were: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) rcutorture: Don't do busted forward-progress testing rcutorture: Use 100ms buckets for forward-progress callback histograms rcutorture: Recover from OOM during forward-progress tests rcutorture: Print forward-progress test age upon failure rcutorture: Print time since GP end upon forward-progress failure rcutorture: Print histogram of CB invocation at OOM time rcutorture: Print GP age upon forward-progress failure rcu: Print per-CPU callback counts for forward-progress failures rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings rcutorture: Dump grace-period diagnostics upon forward-progress OOM rcutorture: Prepare for asynchronous access to rcu_fwd_startat torture: Remove unnecessary "ret" variables rcutorture: Affinity forward-progress test to avoid housekeeping CPUs rcutorture: Break up too-long rcu_torture_fwd_prog() function rcutorture: Remove cbflood facility torture: Bring any extra CPUs online during kernel startup rcutorture: Add call_rcu() flooding forward-progress tests rcutorture/formal: Replace synchronize_sched() with synchronize_rcu() tools/kernel.h: Replace synchronize_sched() with synchronize_rcu() net/decnet: Replace rcu_barrier_bh() with rcu_barrier() ...
| * | | | | | Merge branch 'for-mingo' of ↵Ingo Molnar2018-12-042-5/+5
| |\ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | net/sched: Replace call_rcu_bh() and rcu_barrier_bh()Paul E. McKenney2018-12-012-5/+5
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that call_rcu()'s callback is not invoked until after bh-disable regions of code have completed (in addition to explicitly marked RCU read-side critical sections), call_rcu() can be used in place of call_rcu_bh(). Similarly, rcu_barrier() can be used in place o frcu_barrier_bh(). This commit therefore makes these changes. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: <netdev@vger.kernel.org>
* | | | | / net/sched: cls_flower: Remove old entries from rhashtableRoi Dayan2018-12-191-4/+3
| |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When replacing a rule we add the new rule to the rhashtable but only remove the old if not in skip_sw. This commit fix this and remove the old rule anyway. Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net/sched: cls_flower: Reject duplicated rules also under skip_swOr Gerlitz2018-12-091-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, duplicated rules are rejected only for skip_hw or "none", hence allowing users to push duplicates into HW for no reason. Use the flower tables to protect for that. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reported-by: Chris Mi <chrism@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net/sched: act_police: fix memory leak in case of invalid control actionDavide Caratti2018-11-301-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when users set an invalid control action, kmemleak complains as follows: # echo clear >/sys/kernel/debug/kmemleak # ./tdc.py -e b48b Test b48b: Add police action with exceed goto chain control action All test results: 1..1 ok 1 - b48b # Add police action with exceed goto chain control action about to flush the tap output if tests need to be skipped done flushing skipped test tap output # echo scan >/sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffffa0fafbc3dde0 (size 96): comm "tc", pid 2358, jiffies 4294922738 (age 17.022s) hex dump (first 32 bytes): 2a 00 00 20 00 00 00 00 00 00 7d 00 00 00 00 00 *.. ......}..... f8 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000648803d2>] tcf_action_init_1+0x384/0x4c0 [<00000000cb69382e>] tcf_action_init+0x12b/0x1a0 [<00000000847ef0d4>] tcf_action_add+0x73/0x170 [<0000000093656e14>] tc_ctl_action+0x122/0x160 [<0000000023c98e32>] rtnetlink_rcv_msg+0x263/0x2d0 [<000000003493ae9c>] netlink_rcv_skb+0x4d/0x130 [<00000000de63f8ba>] netlink_unicast+0x209/0x2d0 [<00000000c3da0ebe>] netlink_sendmsg+0x2c1/0x3c0 [<000000007a9e0753>] sock_sendmsg+0x33/0x40 [<00000000457c6d2e>] ___sys_sendmsg+0x2a0/0x2f0 [<00000000c5c6a086>] __sys_sendmsg+0x5e/0xa0 [<00000000446eafce>] do_syscall_64+0x5b/0x180 [<000000004aa871f2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000450c38ef>] 0xffffffffffffffff change tcf_police_init() to avoid leaking 'new' in case TCA_POLICE_RESULT contains TC_ACT_GOTO_CHAIN extended action. Fixes: c08f5ed5d625 ("net/sched: act_police: disallow 'goto chain' on fallback control action") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Prevent invalid access to skb->prev in __qdisc_drop_allChristoph Paasch2018-11-291-0/+3
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __qdisc_drop_all() accesses skb->prev to get to the tail of the segment-list. With commit 68d2f84a1368 ("net: gro: properly remove skb from list") the skb-list handling has been changed to set skb->next to NULL and set the list-poison on skb->prev. With that change, __qdisc_drop_all() will panic when it tries to dereference skb->prev. Since commit 992cba7e276d ("net: Add and use skb_list_del_init().") __list_del_entry is used, leaving skb->prev unchanged (thus, pointing to the list-head if it's the first skb of the list). This will make __qdisc_drop_all modify the next-pointer of the list-head and result in a panic later on: [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108 [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90 [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04 [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202 [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6 [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038 [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062 [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000 [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008 [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000 [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0 [ 34.514593] Call Trace: [ 34.514893] <IRQ> [ 34.515157] napi_gro_receive+0x93/0x150 [ 34.515632] receive_buf+0x893/0x3700 [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0 [ 34.516629] ? virtnet_probe+0x1b40/0x1b40 [ 34.517153] ? __stable_node_chain+0x4d0/0x850 [ 34.517684] ? kfree+0x9a/0x180 [ 34.518067] ? __kasan_slab_free+0x171/0x190 [ 34.518582] ? detach_buf+0x1df/0x650 [ 34.519061] ? lapic_next_event+0x5a/0x90 [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0 [ 34.520093] virtnet_poll+0x2df/0xd60 [ 34.520533] ? receive_buf+0x3700/0x3700 [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140 [ 34.521631] ? htb_dequeue+0x1817/0x25f0 [ 34.522107] ? sch_direct_xmit+0x142/0xf30 [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30 [ 34.523155] net_rx_action+0x2f6/0xc50 [ 34.523601] ? napi_complete_done+0x2f0/0x2f0 [ 34.524126] ? kasan_check_read+0x11/0x20 [ 34.524608] ? _raw_spin_lock+0x7d/0xd0 [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0 [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80 [ 34.526130] ? apic_ack_irq+0x9e/0xe0 [ 34.526567] __do_softirq+0x188/0x4b5 [ 34.527015] irq_exit+0x151/0x180 [ 34.527417] do_IRQ+0xdb/0x150 [ 34.527783] common_interrupt+0xf/0xf [ 34.528223] </IRQ> This patch makes sure that skb->prev is set to NULL when entering netem_enqueue. Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net/sched: act_police: add missing spinlock initializationDavide Caratti2018-11-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") introduces a new spinlock, but forgets its initialization. Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police' action is newly created, to avoid the following lockdep splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. <...> Call Trace: dump_stack+0x85/0xcb register_lock_class+0x581/0x590 __lock_acquire+0xd4/0x1330 ? tcf_police_init+0x2fa/0x650 [act_police] ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? tcf_police_init+0x2fa/0x650 [act_police] ? tcf_police_init+0x55a/0x650 [act_police] _raw_spin_lock_bh+0x34/0x40 ? tcf_police_init+0x2fa/0x650 [act_police] tcf_police_init+0x2fa/0x650 [act_police] tcf_action_init_1+0x384/0x4c0 tcf_action_init+0xf6/0x160 tcf_action_add+0x73/0x170 tc_ctl_action+0x122/0x160 rtnetlink_rcv_msg+0x2a4/0x490 ? netlink_deliver_tap+0x99/0x400 ? validate_linkmsg+0x370/0x370 netlink_rcv_skb+0x4d/0x130 netlink_unicast+0x196/0x230 netlink_sendmsg+0x2e5/0x3e0 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x280/0x2f0 ? _raw_spin_unlock+0x24/0x30 ? handle_pte_fault+0xafe/0xf30 ? find_held_lock+0x2d/0x90 ? syscall_trace_enter+0x1df/0x360 ? __sys_sendmsg+0x5e/0xa0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f1841c7cf10 Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10 RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003 RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0 R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000 Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net/sched: act_police: fix race condition on state variablesDavide Caratti2018-11-201-14/+21
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | after 'police' configuration parameters were converted to use RCU instead of spinlock, the state variables used to compute the traffic rate (namely 'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in the traffic path without any protection. Use a dedicated spinlock to avoid race conditions on these variables, and ensure proper cache-line alignment. In this way, 'police' is still faster than what we observed when 'tcf_lock' was used in the traffic path _ i.e. reverting commit 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path"). Moreover, we preserve the throughput improvement that was obtained after 'police' started using per-cpu counters, when 'avrate' is used instead of 'rate'. Changes since v1 (thanks to Eric Dumazet): - call ktime_get_ns() before acquiring the lock in the traffic path - use a dedicated spinlock instead of tcf_lock - improve cache-line usage Fixes: 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path") Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com>
* | | net/sched: act_pedit: fix memory leak when IDR allocation failsDavide Caratti2018-11-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcf_idr_check_alloc() can return a negative value, on allocation failures (-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases. Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flowsEric Dumazet2018-11-151-12/+19
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When EDT conversion happened, fq lost the ability to enfore a maxrate for all flows. It kept it for non EDT flows. This commit restores the functionality. Tested: tc qd replace dev eth0 root fq maxrate 500Mbit netperf -P0 -H host -- -O THROUGHPUT 489.75 Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | act_mirred: clear skb->tstamp on redirectEric Dumazet2018-11-112-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | If sch_fq is used at ingress, skbs that might have been timestamped by net_timestamp_set() if a packet capture is requesting timestamps could be delayed by arbitrary amount of time, since sch_fq time base is MONOTONIC. Fix this problem by moving code from sch_netem.c to act_mirred.c. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: cls_flower: validate nested enc_opts_policy to avoid warningJakub Kicinski2018-11-101-1/+13
|/ | | | | | | | | | | | | | | | | | | | | TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only currently contain further nested attributes, which are parsed by hand, so the policy is never actually used resulting in a W=1 build warning: net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=] enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = { Add the validation anyway to avoid potential bugs when other attributes are added and to make the attribute structure slightly more clear. Validation will also set extact to point to bad attribute on error. Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: gred: pass the right attribute to gred_change_table_def()Jakub Kicinski2018-10-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute, and expects it will be able to interpret its contents as struct tc_gred_sopt. Pass the correct gred attribute, instead of TCA_OPTIONS. This bug meant the table definition could never be changed after Qdisc was initialized (unless whatever TCA_OPTIONS contained both passed netlink validation and was a valid struct tc_gred_sopt...). Old behaviour: $ ip link add type dummy $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 RTNETLINK answers: Invalid argument Now: $ ip link add type dummy $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 $ tc qdisc replace dev dummy0 parent root handle 7: \ gred setup vqs 4 default 0 Fixes: f62d6b936df5 ("[PKT_SCHED]: GRED: Use central VQ change procedure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: Remove TCA_OPTIONS from policyDavid Ahern2018-10-241-1/+0
| | | | | | | | | | | | | | Marco reported an error with hfsc: root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1 Error: Attribute failed policy validation. Apparently a few implementations pass TCA_OPTIONS as a binary instead of nested attribute, so drop TCA_OPTIONS from the policy. Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes") Reported-by: Marco Berizzi <pupilla@libero.it> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_police: disallow 'goto chain' on fallback control actionDavide Caratti2018-10-221-2/+10
| | | | | | | | | | | | | | | in the following command: # tc action add action police rate <r> burst <b> conform-exceed <c1>/<c2> 'goto chain x' is allowed only for c1: setting it for c2 makes the kernel crash with NULL pointer dereference, since TC core doesn't initialize the chain handle. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_gact: disallow 'goto chain' on fallback control actionDavide Caratti2018-10-221-0/+5
| | | | | | | | | | | | | | | in the following command: # tc action add action <c1> random <rand_type> <c2> <rand_param> 'goto chain x' is allowed only for c1: setting it for c2 makes the kernel crash with NULL pointer dereference, since TC core doesn't initialize the chain handle. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-10-192-10/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: sched: Fix for duplicate class dumpPhil Sutter2018-10-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dumping classes by parent, kernel would return classes twice: | # tc qdisc add dev lo root prio | # tc class show dev lo | class prio 8001:1 parent 8001: | class prio 8001:2 parent 8001: | class prio 8001:3 parent 8001: | # tc class show dev lo parent 8001: | class prio 8001:1 parent 8001: | class prio 8001:2 parent 8001: | class prio 8001:3 parent 8001: | class prio 8001:1 parent 8001: | class prio 8001:2 parent 8001: | class prio 8001:3 parent 8001: This comes from qdisc_match_from_root() potentially returning the root qdisc itself if its handle matched. Though in that case, root's classes were already dumped a few lines above. Fixes: cb395b2010879 ("net: sched: optimize class dumps") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: cls_api: add missing validation of netlink attributesDavide Caratti2018-10-152-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to what has been done in 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes"), fix classifier code to add validation of TCA_CHAIN and TCA_KIND netlink attributes. tested with: # ./tdc.py -c filter v2: Let sch_api and cls_api share nla_policy they have in common, thanks to David Ahern. v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done by TC modules, thanks to Cong Wang. While at it, restore the 'Delete / get qdisc' comment to its orginal position, just above tc_get_qdisc() function prototype. Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_fq: no longer use skb_is_tcp_pure_ack()Eric Dumazet2018-10-151-1/+1
| | | | | | | | | | | | | | | | | | With the new EDT model, sch_fq no longer has to special case TCP pure acks, since their skb->tstamp will allow them being sent without pacing delay. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: extend sk_pacing_rate to unsigned longEric Dumazet2018-10-151-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-10-122-4/+4
|\| | | | | | | | | | | | | | | Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman2018-10-121-5/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David writes: "Networking 1) RXRPC receive path fixes from David Howells. 2) Re-export __skb_recv_udp(), from Jiri Kosina. 3) Fix refcounting in u32 classificer, from Al Viro. 4) Userspace netlink ABI fixes from Eugene Syromiatnikov. 5) Don't double iounmap on rmmod in ena driver, from Arthur Kiyanovski. 6) Fix devlink string attribute handling, we must pull a copy into a kernel buffer if the lifetime extends past the netlink request. From Moshe Shemesh. 7) Fix hangs in RDS, from Ka-Cheong Poon. 8) Fix recursive locking lockdep warnings in tipc, from Ying Xue. 9) Clear RX irq correctly in socionext, from Ilias Apalodimas. 10) bcm_sf2 fixes from Florian Fainelli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: dsa: bcm_sf2: Call setup during switch resume net: dsa: bcm_sf2: Fix unbind ordering net: phy: sfp: remove sfp_mutex's definition r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips net: socionext: clear rx irq correctly net/mlx4_core: Fix warnings during boot on driverinit param set failures tipc: eliminate possible recursive locking detected by LOCKDEP selftests: udpgso_bench.sh explicitly requires bash selftests: rtnetlink.sh explicitly requires bash. qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface tipc: queue socket protocol error messages into socket receive buffer tipc: set link tolerance correctly in broadcast link net: ipv4: don't let PMTU updates increase route MTU net: ipv4: update fnhe_pmtu when first hop's MTU changes net/ipv6: stop leaking percpu memory in fib6 info rds: RDS (tcp) hangs on sendto() to unresponding address net: make skb_partial_csum_set() more robust against overflows devlink: Add helper function for safely copy string param devlink: Fix param cmode driverinit for string type devlink: Fix param set handling for string type ...
| | * net: sched: cls_u32: fix hnode refcountingAl Viro2018-10-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references via ->hlist and via ->tp_root together. u32_destroy() drops the former and, in case when there had been links, leaves the sucker on the list. As the result, there's nothing to protect it from getting freed once links are dropped. That also makes the "is it busy" check incapable of catching the root hnode - it *is* busy (there's a reference from tp), but we don't see it as something separate. "Is it our root?" check partially covers that, but the problem exists for others' roots as well. AFAICS, the minimal fix preserving the existing behaviour (where it doesn't include oopsen, that is) would be this: * count tp->root and tp_c->hlist as separate references. I.e. have u32_init() set refcount to 2, not 1. * in u32_destroy() we always drop the former; in u32_destroy_hnode() - the latter. That way we have *all* references contributing to refcount. List removal happens in u32_destroy_hnode() (called only when ->refcnt is 1) an in u32_destroy() in case of tc_u_common going away, along with everything reachable from it. IOW, that way we know that u32_destroy_key() won't free something still on the list (or pointed to by someone's ->root). Reproducer: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: \ u32 divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: \ u32 divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 100 \ handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \ plus 0 eat match ip protocol 6 ff tc filter delete dev eth0 parent ffff: protocol ip prio 200 tc filter change dev eth0 parent ffff: protocol ip prio 100 \ handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \ eat match ip protocol 6 ff tc filter delete dev eth0 parent ffff: protocol ip prio 100 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge tag 'alloc-args-v4.19-rc8' of ↵Greg Kroah-Hartman2018-10-111-1/+1
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Kees writes: "Fix open-coded multiplication arguments to allocators - Fixes several new open-coded multiplications added in the 4.19 merge window." * tag 'alloc-args-v4.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Replace more open-coded allocation size multiplications
| | * treewide: Replace more open-coded allocation size multiplicationsKees Cook2018-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As done treewide earlier, this catches several more open-coded allocation size calculations that were added to the kernel during the merge window. This performs the following mechanical transformations using Coccinelle: kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...) kvzalloc(a * b, ...) -> kvcalloc(a, b, ...) devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...) Signed-off-by: Kees Cook <keescook@chromium.org>
* | | net: sched: avoid writing on noop_qdiscEric Dumazet2018-10-101-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While noop_qdisc.gso_skb and noop_qdisc.skb_bad_txq are not used in other places, it seems not correct to overwrite their fields in dev_init_scheduler_queue(). noop_qdisc is essentially a shared and read-only object, even if it is not marked as const because of some implementation detail. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Add extack to nlmsg_parseDavid Ahern2018-10-083-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: simplify the hell out u32_delete() emptiness checkAl Viro2018-10-081-47/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have the knode count, we can instantly check if any hnodes are non-empty. And that kills the check for extra references to root hnode - those could happen only if there was a knode to carry such a link. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: keep track of knodes count in tc_u_commonAl Viro2018-10-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | allows to simplify u32_delete() considerably Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: get rid of tp_cAl Viro2018-10-081-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Both hnode ->tp_c and tp_c argument of u32_set_parms() the latter is redundant, the former - never read... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->dataAl Viro2018-10-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It must be tc_u_common associated with that tp (i.e. tp->data). Proof: * both ->ht_up and ->tp_c are assign-once * ->tp_c of anything inserted into tp_c->hlist is tp_c * hnodes never get reinserted into the lists or moved between those, so anything found by u32_lookup_ht(tp->data, ...) will have ->tp_c equal to tp->data. * tp->root->tp_c == tp->data. * ->ht_up of anything inserted into hnode->ht[...] is equal to hnode. * knodes never get reinserted into hash chains or moved between those, so anything returned by u32_lookup_key(ht, ...) will have ->ht_up equal to ht. * any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c point to tp->data Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnodeAl Viro2018-10-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the only thing we used ht for was ht->tp_c and callers can get that without going through ->tp_c at all; start with lifting that into the callers, next commits will massage those, eventually removing ->tp_c altogether. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: clean tc_u_common hashtableAl Viro2018-10-081-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * calculate key *once*, not for each hash chain element * let tc_u_hash() return the pointer to chain head rather than index - callers are cleaner that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: get rid of tc_u_common ->rcuAl Viro2018-10-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | unused Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: get rid of tc_u_knode ->tpAl Viro2018-10-081-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: get rid of unused argument of u32_destroy_key()Al Viro2018-10-081-7/+6
| | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: make sure that divisor is a power of 2Al Viro2018-10-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Tested by modifying iproute2 to allow sending a divisor > 255 Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: disallow linking to root hnodeAl Viro2018-10-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Operation makes no sense. Nothing will actually break if we do so (depth limit in u32_classify() will prevent infinite loops), but according to maintainers it's best prohibited outright. NOTE: doing so guarantees that u32_destroy() will trigger the call of u32_destroy_hnode(); we might want to make that unconditional. Test: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \ link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff should fail with Error: cls_u32: Not linking to root node Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: cls_u32: mark root hnode explicitlyAl Viro2018-10-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and produce consistent error on attempt to delete such. Existing check in u32_delete() is inconsistent - after tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 \ divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 \ divisor 1 both tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 801: u32 and tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 800: u32 will fail (at least with refcounting fixes), but the former will complain about an attempt to remove a busy table, while the latter will recognize it as root and yield "Not allowed to delete root node" instead. The problem with the existing check is that several tcf_proto instances might share the same tp->data and handle-to-hnode lookup will be the same for all of them. So comparing an hnode to be deleted with tp->root won't catch the case when one tp is used to try deleting the root of another. Solution is trivial - mark the root hnodes explicitly upon allocation and check for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud