MySQL Bugs: #68537: error 2341: 'Internal program error' on production environment. Nodes shutdown.

Bug #68537	error 2341: 'Internal program error' on production environment. Nodes shutdown.
Submitted:	1 Mar 2013 10:37	Modified:	18 Jul 2016 15:57
Reporter:	Sky De Sky De	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.5.25 ndb-7.2.7	OS:	Linux (Red Hat 6.3)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	error 2341

Description:
We have a cluster with two data nodes, during the creation of an index on a table of about 3000000 rows the two nodes shutdown abruptly with the error in the mgm_1.log that follows:

2013-02-28 10:23:58 [MgmtSrvr] ALERT -- Node 12: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2013-02-28 10:23:58 [MgmtSrvr] ALERT -- Node 11: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

After the reboot of the first node, the second one couldn't go up with the error:

2013-02-28 10:50:49 [MgmtSrvr] ALERT -- Node 12: Forced node shutdown completed. Occured during startphase 5. Caused by error 2355: 'Failure to restore schema(Resource configuration error). Permanent error, external action needed'.

The table where we tried to create the index results now very slow and often locked even if there wasn't an active transaction on it. We haven't could perform also a simple select on it, the error was always:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Also increasing the parameter innodb_lock_wait_timeout to 120 didn't help.
So we have created a new table empty with the same structure of the one that caused the issue and we created the index on the structure, then we have populated again with a procedure and after that rename the first table as old the new as the original table and all seems to work fine. The weird thing is that now also the renamed old table seems to work fine, there is no more lock.

After that we resynced from scratch the second node(--initial option) and it got up.

We would like to know what happened and how to avoid similar situation in the future, because this is a very critical productive environment.

How to repeat:
Create an hash index on a non empty table.

Hello!

Thank you for the report.

I suspect a possible filesystem corruption of the NDBFS and you did the right thing (initial) restart of affected node to bring it up.

Could you please attach the complete cluster logs? Preferably using the ndb_error_reporter utility:

  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-error-reporter.html

Hi,
any news about this issue?

Duplicate of closed bugs
 - 17772163 (fixed by 17772138 in 7.2.23)
 - 17772138 (fixed in 7.2.23)
 - 21091248 (found in 7.4.4)

Hi,

we have updated our MySQL to version 7.4 Enterprise Edition and the problem doesn't appear anymore.

Regards,
Nadia