MySQL Bugs: #54406: ndbd cannot start due to error 721

Bug #54406	ndbd cannot start due to error 721
Submitted:	10 Jun 2010 17:42	Modified:	12 Oct 2010 13:44
Reporter:	Nicholas Hill	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-7.1	OS:	Linux (CentOS 5 x86_64)
Assigned to:	Magnus Blåudd	CPU Architecture:	Any
Tags:	7.1.3, 721, dbdict, error 721, MySQL Cluster, ndb, ndb-7.1.3, ndbd, redo

Description:
After a complete cluster shutdown to increase number of virtual cpus in VMWare ESXi, I was unable to bring the ndbd nodes back up due to the following error reported by ndbd:

2010-06-10 13:23:16 [ndbd] INFO     -- Failure to recreate object during restart, error 721 Please follow instructions from 'perror --ndb 721'
2010-06-10 13:23:16 [ndbd] INFO     -- DBDICT (Line: 4237) 0x00000002
error=2355
2010-06-10 13:23:16 [ndbd] INFO     -- Error handler startup shutting down system
2010-06-10 13:23:16 [ndbd] INFO     -- Error handler shutdown completed - exiting
sphase=4
exit=-1

How to repeat:
After a complete (clean) cluster shutdown, I restarted the ndbd nodes and received the previous error.

Suggested fix:
A fix was supposed to be pushed through for version 7.1.3

ndb_error_report log

Attachment: ndb_error_report_20100610133239.tar.bz2 (application/octet-stream, text), 393.99 KiB.

This is similar to Bug #52135

Error seems to be like bug #52135.  But this is in the fixed version.

After running an ndbd --initial on each of my ndb nodes, the cluster ran properly once again.

I started to perform a rolling restart by shutting down one of the nodes through ndb_mgm and restarting the node and was presented with the same error and am no longer able to restart that node, even with the --initial switch.

ndb_error_report to follow

I have uploaded file ndb_error_report_54406.tar.bz2 to the write only FTP.

here is a mysqldump -A --no-data dump

Attachment: mysqldump_54406.sql (application/octet-stream, text), 45.35 KiB.

Haven't been able to reproduce with these schemas yet

This might be related to bug #54651. This bug would leave an invalid cluster dictionary and any node restart after that will fail with error 721.

Analyzing the attached trace files shows that this is a duplicate of Bug#54651 which alloes a table to be altered to the same name as an already existing table. The duplicate table name problem is not detected until the next node or system restart and cause the above mentioned error message to be printed.

Since this problem can happen as part of an upgrade from a version where Bug#54651 has not yet been fixed we will modify the error message printouts for this case to be more helpful and avoid refering to "perror --ndb 721" since that is not very helpful.