| Bug #46412 | NDBRequire hit in Dbdih::invalidateLcpInfoAfterSr | ||
|---|---|---|---|
| Submitted: | 27 Jul 2009 18:02 | Modified: | 18 Aug 2009 15:03 |
| Reporter: | Andrew Hutchings | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 6.3.24 | OS: | Any |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[10 Aug 2009 13:43]
Martin Skold
Does restarting the node manually (possibly with --inital) solve the problem?
[10 Aug 2009 15:14]
Andrew Hutchings
The cluster was restored from backup before --initial was tried and the problem could not be reproduced since.
[17 Aug 2009 13:16]
Jonas Oreland
reproduced...using 2 new error inserts
[18 Aug 2009 6:57]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80966 3010 Jonas Oreland 2009-08-18 ndb - bug#46412 Fix/handle incorrectly set lcp-bits during system restart
[18 Aug 2009 7:02]
Jonas Oreland
pushed to 6.3.26 and 7.0.7 docs: 1) lcp starts 2) master dies almost directly afterwards 3) rest of cluster dies within 1-2s 4) crash when restarting
[18 Aug 2009 15:03]
Jon Stephens
Documented bugfix in the NDB-6.3.26 and 7.0.7 changelogs as follows:
Killing MySQL Cluster nodes immediately following a local checkpoint could
lead to a crash of the cluster when later attempting to perform a system
restart.
The exact sequence of events causing this issue was as follows:
1. Local checkpoint occurs.
2. Immediately following the LCP, kill the master data node.
3. Kill the remaining data nodes within a few seconds of killing the
master.
4. Attempt to restart the cluster.

Description: During a cluster restart a node hit the ndbrequire in the function above during startphase 4. I believe this is because a node is in LCP when it shouldn't be. Time: Monday 27 July 2009 - 13:45:59 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbdih/DbdihMain.cpp Error object: DBDIH (Line: 9186) 0x0000000a Program: /usr/mysql/libexec/ndbd Pid: 18674 Trace: /user/database/log/ndb_4_trace.log.2 Version: mysql-5.1.32 ndb-6.3.24-GA DBDIH 000658 000694 000769 014798 014798 014778 014774 014778 014774 014778 014774 014778 014774 014774 014774 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014798 014843 014843 014833 014833 014833 014833 014833 014833 014833 014833 014833 014833 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014843 014854 014858 014854 014858 014854 014858 014854 014858 014854 014858 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014854 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 014896 000778 009172 009175 009196 009172 009175 009196 009172 009175 009188 009172 009172 009175 009188 009172 009172 009175 009188 009172 009172 009175 009188 009172 009172 009175 009186 How to repeat: Unkown