Commits

Donald Sharp authored and Paul Jakma committed 7ef4221c3f8
bgpd: Fix race in clearing completion

When a peer that is Established goes down, it is moved into the Clearing state to facilitate clearing of the routes received from the peer - remove from the RIB, reselect best path, update/delete from Zebra and to other peers etc. At the end of this, a Clearing_Completed event is generated to the FSM which will allow the peer to move out of Clearing to Idle. The issue in the code is that there is a possibility of multiple Clearing Completed events being generated for a peer, one per AFI/SAFI. Upon the first such event, the peer would move to Idle. If other events happened (e.g., new connection got established) before the last Clearing_Completed event is received, bad things can happen. Fix to ensure only one Clearing_Completed event is generated. Signed-off-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
No tags

bgpd/bgp_fsm.c

Modified
399 399 read/write and timer thread. */
400 400 void
401 401 bgp_fsm_change_status (struct peer *peer, int status)
402 402 {
403 403 bgp_dump_state (peer, peer->status, status);
404 404
405 405 /* Transition into Clearing or Deleted must /always/ clear all routes..
406 406 * (and must do so before actually changing into Deleted..
407 407 */
408 408 if (status >= Clearing)
409 - bgp_clear_route_all (peer);
409 + {
410 + bgp_clear_route_all (peer);
411 +
412 + /* If no route was queued for the clear-node processing, generate the
413 + * completion event here. This is needed because if there are no routes
414 + * to trigger the background clear-node thread, the event won't get
415 + * generated and the peer would be stuck in Clearing. Note that this
416 + * event is for the peer and helps the peer transition out of Clearing
417 + * state; it should not be generated per (AFI,SAFI). The event is
418 + * directly posted here without calling clear_node_complete() as we
419 + * shouldn't do an extra unlock. This event will get processed after
420 + * the state change that happens below, so peer will be in Clearing
421 + * (or Deleted).
422 + */
423 + if (!peer->clear_node_queue->thread)
424 + BGP_EVENT_ADD (peer, Clearing_Completed);
425 + }
410 426
411 427 /* Preserve old status and change into new status. */
412 428 peer->ostatus = peer->status;
413 429 peer->status = status;
414 430
415 431 if (BGP_DEBUG (normal, NORMAL))
416 432 zlog_debug ("%s went from %s to %s",
417 433 peer->host,
418 434 LOOKUP (bgp_status_msg, peer->ostatus),
419 435 LOOKUP (bgp_status_msg, peer->status));

Everything looks good. We'll let you know here if there's anything you should know about.

Add shortcut