Bug #68537 error 2341: 'Internal program error' on production environment. Nodes shutdown.
Submitted: 1 Mar 2013 10:37 Modified: 18 Jul 2016 15:57
Reporter: Sky De Sky De Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.5.25 ndb-7.2.7 OS:Linux (Red Hat 6.3)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: error 2341

[1 Mar 2013 10:37] Sky De Sky De
Description:
We have a cluster with two data nodes, during the creation of an index on a table of about 3000000 rows the two nodes shutdown abruptly with the error in the mgm_1.log that follows:

2013-02-28 10:23:58 [MgmtSrvr] ALERT    -- Node 12: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2013-02-28 10:23:58 [MgmtSrvr] ALERT    -- Node 11: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

After the reboot of the first node, the second one couldn't go up with the error:

2013-02-28 10:50:49 [MgmtSrvr] ALERT    -- Node 12: Forced node shutdown completed. Occured during startphase 5. Caused by error 2355: 'Failure to restore schema(Resource configuration error). Permanent error, external action needed'.

The table where we tried to create the index results now very slow and often locked even if there wasn't an active transaction on it. We haven't could perform also a simple select on it, the error was always:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Also increasing the parameter innodb_lock_wait_timeout to 120 didn't help. 
So we have created a new table empty with the same structure of the one that caused the issue and we created the index on the structure, then we have populated again with a procedure and after that rename the first table as old the new as the original table and all seems to work fine. The weird thing is that now also the renamed old table seems to work fine, there is no more lock. 

After that we resynced from scratch the second node(--initial option) and it got up.

We would like to know what happened and how to avoid similar situation in the future, because this is a very critical productive environment. 

How to repeat:
Create an hash index on a non empty table.
[4 Mar 2013 8:06] MySQL Verification Team
Hello!

Thank you for the report.

I suspect a possible filesystem corruption of the NDBFS and you did the right thing (initial) restart of affected node to bring it up.

Could you please attach the complete cluster logs? Preferably using the ndb_error_reporter utility:

  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-error-reporter.html
[4 Apr 2013 8:30] Sky De Sky De
Hi,
any news about this issue?
[18 Jul 2016 15:56] MySQL Verification Team
Duplicate of closed bugs
 - 17772163 (fixed by 17772138 in 7.2.23)
 - 17772138 (fixed in 7.2.23)
 - 21091248 (found in 7.4.4)
[18 Jul 2016 15:57] Sky De Sky De
Hi,

we have updated our MySQL to version 7.4 Enterprise Edition and the problem doesn't appear anymore.

Regards,
Nadia