Bug #78143 Reception of events stops after initial cluster restart
Submitted: 19 Aug 2015 21:43 Modified: 12 Oct 2015 12:23
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S3 (Non-critical)
Version:7.4.7 OS:Any
Assigned to: CPU Architecture:Any

[19 Aug 2015 21:43] Ole John Aske
Description:
If a cluster is restarted/fails 'initial', a new sequence of increasing GCIs will
be produced. This means that the Event-API will start receiving GCIs which
are lower than the previous completed 'm_latest_GCI'.

Normally this is a situation which may happen during a node failure,
where another node taking over as SUMA is unsure about which GCIs
the client has seen before the previous SUMA failed. Thus, the
new SUMA will resend some events which are in a window of
uncertainty. The event-API will find that these resent events
has 'gci <= m_latestGCI' and thus ignore them.

After an initial node restart the situation is different:
We might already have received GCIs upto a m_latestGCI before the restart.
As part of the restart handling we need the m_latestGCI to be cleared,
such that the new (lower) sequence of GCIs are accepted.

Currently, such a m_latestGCI reset is done by ::init_gci_containers()
which is called after all event operations has been dropped.

However, the correctness of this logic depends on that *all* event
operations are dropped before new are created. Such a requirement
is neither documented or make sense.

NOTE: Another effect of always resetting m_latestGCI when last
event op is dropped, is that we loose the GCI used for the
final cleanup of deleted event operations. (deleteUsedEventOperations()).
Such cleanup is supposed to be done by pollEvents() and nextEvent() when 
it has consumed all events upto m_latestGCI. However, if events are dropped
before poll/nextEvent() consume the event buffer, m_latestGCI is already cleared,
and nothing can be deleted.

Use case:

 1) event op1 is created
    ... does some work
 2) Cluster fails and restart 'initial'
    A CLUSTER_FAILURE event is inserted for op1
    ... does more work
 3) Event op2 is created
 4) Event op1 is deleted
    (NOTE: Never reached a state of 'no events' -> m_latestGCI is not reset)

 5) Client start polling, as op1 is deleted the CLUSTER_FAILURE event
    is not returned! -> There are no ways for the client to know it has to
    do failure handling!

  ... After the queue of buffered events has been drained events effectively
stops arriving as they are rejected by the high m_latestGCI.

It can be argued that events should be polled before 4) drops the event
to possible catch CLUSTER_FAILURES. However, it is always the possibility
that a FAILURE can arrive after poll completed, and before op1 is dropped.

How to repeat:
A new ATR testcase 'test_event -n Apiv2-check_event_resumed_initial_restart'
will be supplied as part of the proposed patch.
[12 Oct 2015 12:23] Jon Stephens
Documented fix in the NDB 7.4.8 and 7.5.0 changelogs, as follows:

    After the initial restart of a node following a cluster failure,
    the cluster failure event added as part of the restart process
    was deleted when an event that existed prior to the restart was
    later deleted. This meant that, in such cases, an Event API
    client had no way of knowing that failure handling was needed.
    In addition, the GCI used for the final cleanup of deleted event
    operations performed by pollEvents() and nextEvent() when they
    have consumed all available events was lost.

Closed.