Bug #74503 | Dict operations during TAKEOVER may crash new master | ||
---|---|---|---|
Submitted: | 22 Oct 2014 11:29 | Modified: | 4 Nov 2014 18:32 |
Reporter: | Ole John Aske | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Disk Data | Severity: | S1 (Critical) |
Version: | 7.1.33 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Oct 2014 11:29]
Ole John Aske
[22 Oct 2014 13:17]
Ole John Aske
Posted by developer: Note: A 4-node config is required in order to run all the testcases in 'testDict -n schemaTrans'
[4 Nov 2014 18:32]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. Documented fix in the NDB 7.1.34, 7.2.19, and 7.3.8 changelogs, as follows: When a node acting as a DICT master fails, the arbitrator selects another node to take over in place of the failed node. During the takeover procedure, which includes cleaning up any schema transactions which are still open when the master failed, the disposition of the uncommitted schema transaction is decided. Normally this transaction be rolled back, but if it has completed a sufficient portion of a commit request, the new master finishes processing the commit. Until the fate of the transaction has been decided, no new TRANS_END_REQ messages from clients can be processed. In addition, since multiple concurrent schema transactions are not supported, takeover cleanup must be completed before any new transactions can be started. A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results. The scenarios just described were previously handled using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. Closed. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html