summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDoug Oucharek <doug.s.oucharek@intel.com>2016-03-02 18:53:27 -0500
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2016-03-02 16:01:38 -0800
commita70d69ae888f03b2bc754020baabdb1d6ad4782d (patch)
tree4eeaf9f69ee1360f3ecaea6e3c4acc7a77632fbf
parent992f0b226e4e49c9c3daccc19418cb2fdf2be1cf (diff)
downloadblackbird-op-linux-a70d69ae888f03b2bc754020baabdb1d6ad4782d.tar.gz
blackbird-op-linux-a70d69ae888f03b2bc754020baabdb1d6ad4782d.zip
staging: lustre: Change connect peer failed cleanup order
A race condition has been found where connd is cleaning up failed connections, the peer ref counter goes to zero, but we stil have a connecting counter > 0. One possible race is when we are retrying a connection by calling kiblnd_connect_peer() which itself fails and decrements the peer ref counter and gets swapped out before it can decrement the connecting counter. connd swaps in and cleans up the connection where it sees a peer ref counter of 1 and a connecting counter of 1. This will trigger the assert seen in LU-7210 when it decrements the peer counter. The solution: be sure to decrement the connecting counter before decrementing the peer counter in the peer connect failure path. Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7210 Reviewed-on: http://review.whamcloud.com/17004 Reviewed-by: James Simmons <uja.ornl@yahoo.com> Reviewed-by: Amir Shehata <amir.shehata@intel.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r--drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c2
1 files changed, 2 insertions, 0 deletions
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index f76c57074529..7602d7142461 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1298,8 +1298,10 @@ kiblnd_connect_peer(kib_peer_t *peer)
return;
failed2:
+ kiblnd_peer_connect_failed(peer, 1, rc);
kiblnd_peer_decref(peer); /* cmid's ref */
rdma_destroy_id(cmid);
+ return;
failed:
kiblnd_peer_connect_failed(peer, 1, rc);
}
OpenPOWER on IntegriCloud