summaryrefslogtreecommitdiffstats
path: root/lib
diff options
context:
space:
mode:
authorEric Dumazet <edumazet@google.com>2014-09-25 23:04:56 -0700
committerDavid S. Miller <davem@davemloft.net>2014-09-29 00:04:55 -0400
commit3d9a0d2f8212879407e58d67f460d8920eb6543d (patch)
tree2f21d1f2173c017fabddec9009de8aba855b7c22 /lib
parent68f6a7c6c9817f2e6a66b59893de3c901ae5608c (diff)
downloadblackbird-obmc-linux-3d9a0d2f8212879407e58d67f460d8920eb6543d.tar.gz
blackbird-obmc-linux-3d9a0d2f8212879407e58d67f460d8920eb6543d.zip
dql: dql_queued() should write first to reduce bus transactions
While doing high throughput test on a BQL enabled NIC, I found a very high cost in ndo_start_xmit() when accessing BQL data. It turned out the problem was caused by compiler trying to be smart, but involving a bad MESI transaction : 0.05 │ mov 0xc0(%rax),%edi // LOAD dql->num_queued 0.48 │ mov %edx,0xc8(%rax) // STORE dql->last_obj_cnt = count 58.23 │ add %edx,%edi 0.58 │ cmp %edi,0xc4(%rax) 0.76 │ mov %edi,0xc0(%rax) // STORE dql->num_queued += count 0.72 │ js bd8 I got an incredible 10 % gain [1] by making sure cpu do not attempt to get the cache line in Shared mode, but directly requests for ownership. New code : mov %edx,0xc8(%rax) // STORE dql->last_obj_cnt = count add %edx,0xc0(%rax) // RMW dql->num_queued += count mov 0xc4(%rax),%ecx // LOAD dql->adj_limit mov 0xc0(%rax),%edx // LOAD dql->num_queued cmp %edx,%ecx The TX completion was running from another cpu, with high interrupts rate. Note that I am using barrier() as a soft hint, as mb() here could be too heavy cost. [1] This was a netperf TCP_STREAM with TSO disabled, but GSO enabled. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'lib')
0 files changed, 0 insertions, 0 deletions
OpenPOWER on IntegriCloud