Bug #78381 Dropped event operations removed too early after initial restart
Submitted: 9 Sep 2015 12:57 Modified: 15 Oct 2015 12:03
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S3 (Non-critical)
Version:7.4.7 OS:Any
Assigned to: CPU Architecture:Any

[9 Sep 2015 12:57] Ole John Aske
Description:
NdbEventOperations are dropped by the clients by calling dropEventOperation().
However, as the queued event may still refer it, its memory can not be
released, and the event operation destructed, until all events possible
referring it has been consumed.

For this purpose the Event API implementation use an expiry mechanism 
based on the GCI / epoch numbers: A dropped event operation is tagged
with a 'm_stop_gci', which identifies the gci of the last event
possible referring it. When all events with this GCI, or a higher,
has been consumed, ::deleteUsedEventOperation() will release and destruct
any expired event ops.

This garbage collection relies on that GCIs are always monotonic increasing.
However, during an initial restart the GCI sequence is reset, and thus breaks
the assumption about monotonic increasing GCIs. This creates situations where
dropped event operations are either released too early, and ::nextEvent() 
then referring release memory, or not being released at all such that memory
leaks.

Release too early case:
 0) There is an event evop1 defined which we receive events for.
 1) Event buffer has been polled, but not fully consumed.
 2) Node restarts.
 3) Another event 'evop2' is created, and more events completes for this.
 4) Client drops evop2, gets a rather low 'stop_gci' from after restart.
 5) Client consumed buffered events from 1). During this it will find
    a rather high 'lastGCI', and thus destruct evop2 from 4) !
 6) Client poll more events, and now gets events related to the dropped evop2.
 7) When consuming these, the destructed & released evOp2 will be referred,
    causing possible random behavior or crashes.

Memory leak case: (More likely)
 0) There is an event evop1 defined which we receive events for.
 1) Event arrived in complete buffer, but not yet polled.
 2) Node restarts.
 3) Failure is detected, and evop1 dropped, gets a high stop_gci from
    the time of CLUSTER_FAILURE.
 4) New event op2 recreated, more events arrive with low gci from new sequence.
 5) Starts polling and consuming events, as we now only receives events with
    a low gci, evop1 will never (or for a long time) not be garbage collected.
    ... effectively there is a leak.

How to repeat:
Testcase will be supplied as part of fix.
[15 Oct 2015 12:03] Jon Stephens
Fixed in NDB 7.4.9. Documented in the changelog as shown here:

    Garbage collection is performed on several objects in the
    implementation of NdbEventOperation, based on which GCIs have
    been consumed by clients, including those that have been dropped
    by Ndb::dropEventOperation(). In this implementation, the
    assumption was made that the GCI is always monotonically
    increasing, although this is not the case during an initial
    restart, when the GCI is reset. This could lead to event objects
    in the NDB API being released prematurely or not at all, in the
    latter case causing a resource leak.

    To prevent this from happening, the NDB event object
    implementation now tracks internally both the GCI and the
    generation of the GCI; the generation is incremented whenever
    the node process is restarted, and this value is now used to
    provide a monotonically increasing sequence.

Closed.