Bug #79635 Events may be missing from the first epoch(s) received
Submitted: 14 Dec 2015 14:40 Modified: 12 Jan 2016 4:39
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.2.22 OS:Any
Assigned to: CPU Architecture:Any

[14 Dec 2015 14:40] Ole John Aske
Description:
The NDB Event-API need the completion of each epoch to be reported by each datanode taking part in the SUMA protocol. A SUB_GCP_COMPLETE_REP signal is sent from each participating datanode, and when this has been received from all participants for a specific epoch, the epoch is 'completed'.

When an epoch is 'complete', it is available to be polled by the pollEvent() API, and consumed by a client. Furthermore, the completion of an epoch will cause the Event-API to silently ignore any further events received for this Epoch. This is internally used as an mechanism to reject duplicates being sent after a node failure, where the takeover-SUMA can't know for sure which events was received, and which were lost.

The number of SUB_GCP_COMPLETE_REP's to be expect from the datanodes is initially not known by the event-API, but is received as part of the SUB_START_CONF signal. However, there is a possible (expected) race between this SUB_START_CONF signal, and the first SUB_GCP_COMPLETE_REP arriving. Thus we init number of SUB_GCP_COMPLETE_REP to a high 'unknown' 'TOTAL_BUCKETS_INIT' value and use that until the real 'count' has been communicated to us.

At this point we know there is a 'delta = TOTAL_BUCKETS_INIT - count' between the temporary unknown value we started with, and the real number of 
SUB_GCP_COMPLETE_REP's to expect. Iff we at this point finds that all SUB_GCP_COMPLETE_REP already had been received, the epoch is handled as 'complete' - See above.
 
However, the calculation if the missing delta of SUB_GCP_COMPLETE_REP's was incorrect. Instead of using 'TOTAL_BUCKETS_INIT - count' as the delta,
'TOTAL_BUCKETS_INIT' was used. This resulted any partially completed epoch being fully completed when the SUB_START_CONF signal was received. Any missing part of the epoch arriving later, was then ignored as duplicated epoch data.

The full impact of this is not completely clear, but what we know is:

- The binlog thread depends on a TE_SUBSCRIBE event being received in order to know which mysqld's a schema change should be distributed to. As these SUBSCRIBE events are among the first events being exchanged, some of them can be lost as part of this bug. This results in the mysqld's not being aware of each other, and schema changes not being correctly distributed. (Note: This is the root cause of MTR test ndb_share.test failing)

- Probably parts of transactions committed in the first epoch(s) in the binlog can be missing.

How to repeat:
Run MTR test ndb_share with debug compiled binaries with 'repeat= 100'
[12 Jan 2016 4:39] Jon Stephens
Documented fix in the NDB 7.2.23, 7.3.12, and 7.4.9 changelogs as follows:

    The internal NdbEventBuffer::set_total_buckets() method calculated the
    number of remaining buckets incorrectly. This caused any incomplete
    epoch to be prematurely completed when the SUB_START_CONF signal arrived
    out of order. Any events belonging to this epoch arriving later were
    then ignored, and so effectively lost, which resulted in schema changes
    not being distributed correctly among SQL nodes. 

Closed.