Bug #74503 Dict operations during TAKEOVER may crash new master
Submitted: 22 Oct 2014 11:29 Modified: 4 Nov 2014 18:32
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S1 (Critical)
Version:7.1.33 OS:Any
Assigned to: CPU Architecture:Any

[22 Oct 2014 11:29] Ole John Aske
Description:
When a node acting as a DICT master fails, the arbitrator will select another node to take over the the DICT master responsibility. The take over procedure involves cleaning up any schema transactions which are still open when the master failed.

During this takeover period the outcome of the still open schema transaction is decided: It would normally be rolled back, but if it has completed a sufficient amount of a 'commit' request, the new master will complete the commit processing. Until the fate of the transaction has been decided, we have to hold back any TRANS_END_REQ's from the clients.

Furthermore, the dict implementation does not support multiple concurrent schema transactions. Thus, the above takeover cleanup has to be completed before any new transactions could be started.

A similar restriction also applies to any schema operations which are done in the scope of an open schema transaction: The transactions 'SafeCounter m_counter' is used to coordinate the different schema operation steps across all nodes. This is used both during the takeover processing, and when executing any 'non local' schema operations. Thus, starting a schema operation while its schema transaction is in the takeover phase, will cause the m_counter to be garbled by the two concurrent users, and the outcome is rather unpredictable.

The scenarios described above is normally hidden by a pseudo random ~100ms delay in the retry logic in NdbDictInterface::dictSignal() when it recovers from a node failure. Normally this is sufficient to let the takeover complete without any new requests arrive in the vulnerable phase, However, there are no guarantees without explicit checking for this and we are seeing randomly failures in this code from time to time (AutoTest)

How to repeat:
Reduce/remove the retry delay in ::dictSignal():

=== modified file 'storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp'
--- storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp	revid:ole.john.aske@oracle.com-20141021143311-wsphxp6k4rtj2bus
+++ storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp	2014-10-22 08:18:52 +0000
@@ -2358,7 +2358,7 @@
 
   for(Uint32 i = 0; i<RETRIES; i++)
   {
-    if (i > 0)
+    if (i > 50)
     {
       Uint32 t = sleep + 10 * (rand() % mod);

Then run the AutoTest 'dictTest -n schemaTrans'
[22 Oct 2014 13:17] Ole John Aske
Posted by developer:
 
Note: A 4-node config is required in order to run all the testcases in 'testDict -n schemaTrans'
[4 Nov 2014 18:32] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

Documented fix in the NDB 7.1.34, 7.2.19, and 7.3.8 changelogs, as follows:

        When a node acting as a DICT master fails, the arbitrator
        selects another node to take over in place of the failed node.
        During the takeover procedure, which includes cleaning up any
        schema transactions which are still open when the master failed,
        the disposition of the uncommitted schema transaction is
        decided. Normally this transaction be rolled back, but if it has
        completed a sufficient portion of a commit request, the new
        master finishes processing the commit. Until the fate of the
        transaction has been decided, no new TRANS_END_REQ messages from
        clients can be processed. In addition, since multiple concurrent
        schema transactions are not supported, takeover cleanup must be
        completed before any new transactions can be started.

        A similar restriction applies to any schema operations which are
        performed in the scope of an open schema transaction. The
        counter used to coordinate schema operation across all nodes is
        employed both during takeover processing and when executing any
        non-local schema operations. This means that starting a schema
        operation while its schema transaction is in the takeover phase
        causes this counter to be overwritten by concurrent uses, with
        unpredictable results.

        The scenarios just described were previously handled using a
        pseudo-random delay when recovering from a node failure. Now we
        check before the new master has rolled forward or backwards any
        schema transactions remaining after the failure of the previous
        master and avoid starting new schema transactions or performing
        operations using old transactions until takeover processing has
        cleaned up after the abandoned transaction.

Closed.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html