Bug #49560 | Hanging restart with mysqld + take-over during system restart | ||
---|---|---|---|
Submitted: | 9 Dec 2009 14:18 | Modified: | 11 Dec 2009 9:39 |
Reporter: | Jonas Oreland | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1-telco-7.0 | OS: | Any |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[9 Dec 2009 14:18]
Jonas Oreland
[9 Dec 2009 15:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/93349 3189 Jonas Oreland 2009-12-09 ndb - bug#49560 - cleanup error handling wrt Suma not started
[9 Dec 2009 15:31]
Jonas Oreland
By carefully reading code, it was discovered that this only exists in 7.0 Patch was made to 6.3 anyway, as it cleanup the code-path And to keep the 2 version relatively in sync.
[10 Dec 2009 6:23]
Jonas Oreland
pushed to 6.3.29 and 7.0.10
[11 Dec 2009 9:39]
Jon Stephens
Documented bugfix in the NDB-6.3.29 and 7.0.10 changelogs as follows: Node takeover during a system restart occurs when the REDO log for one or more data nodes is out of date, so that a node restart is invoked for that node or those nodes. If this happens while a mysqld is attached to the cluster, the mysqld takes a global schema lock (a row lock), while trying to set up cluster-internal replication. However, this setup process could fail, causing the global schema lock to be held for an excessive length of time, which made the node restart hang as well. As a result, the mysqld failed to set up cluster-internal replication, which led to tables being read-only, and caused one node to hang during the restart. NOTE: This issue could actually occur in MySQL Cluster NDB 7.0 only, but the fix was also applied in MySQL Cluster NDB 6.3, in order to keep the two codebases in alignment. Closed.