Bug #62177 restart data node after MaxNoOfConcurrentOperations/MaxNoOfLocalOperation change
Submitted: 16 Aug 2011 11:54 Modified: 11 Oct 2016 23:39
Reporter: Eugene Zheganin Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.1.15 OS:Linux (Debian/Squeeze amd64)
Assigned to: CPU Architecture:Any

[16 Aug 2011 11:54] Eugene Zheganin
Description:
After canging config.ini values from

#MaxNoOfConcurrentOperations=131072
#MaxNoOfLocalOperations=144180
to:
MaxNoOfConcurrentOperations=262144
MaxNoOfLocalOperations=288358
MaxNoOfConcurrentIndexOperations=32768

This configuration change was made in order to delete a bunch of records from a  table with more than 20M records. The DML was terminated with message: 

ERROR 1297 (HY000): Got temporary error 233 'Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations)' from NDBCLUSTER

I shut down the ndb_mgm, started it with --reload, and shut down first node (living on the same server as ndb_mgmd).

It did not start back.  The error was

2011-08-16 15:32:51 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Occured during startphase 5. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2011-08-16 15:32:51 [MgmtSrvr] ALERT    -- Node 1: Node 2 Disconnected

The thing is that reverting back to the original values didn't help either.
The error is quite repeatable. To be honest, I lost any hope to start this node back. At this time the cluster is running on one single node.

How to repeat:
I seriously doubt that this can be repeated without actual data, but I can attach the config.ini file and the traces.
[16 Aug 2011 11:57] Eugene Zheganin
ndb_error_report stuff

Attachment: ndb_error_report_20110816154315.tar.bz2 (application/x-bzip, text), 619 bytes.

[16 Aug 2011 11:58] Eugene Zheganin
cluster log, taken after ndb_mdmd start and ndbd start

Attachment: ndb_1_cluster.log.bz2 (application/x-bzip, text), 2.52 KiB.

[16 Aug 2011 11:58] Eugene Zheganin
cluster error log, taken accordingly

Attachment: ndb_2_error.log.bz2 (application/x-bzip, text), 665 bytes.

[16 Aug 2011 11:59] Eugene Zheganin
cluster output log

Attachment: ndb_2_out.log.bz2 (application/x-bzip, text), 15.26 KiB.

[16 Aug 2011 11:59] Eugene Zheganin
main trace log, containing last ndbd start with --initial parameter

Attachment: ndb_2_trace.log.8.bz2 (application/x-bzip, text), 51.41 KiB.

[16 Aug 2011 12:01] Eugene Zheganin
I did also try to start data node with --initial parameter, I saw no difference except some minor log changes.
Node crashed as usually.
[16 Aug 2011 12:46] Jonas Oreland
Hi,

I think (pretty sure) that you (also) found Bug#62116
which by now can be considered one of the worst bugs we have had
in a few years :-(

If you compile yourself, you can try the patch.
Otherwise are we working on new binaries.
Which I strongly suggest changing to...

/Jonas
[16 Aug 2011 18:01] Eugene Zheganin
Thanks, I rebuilded from sources. Patch did help.

But after patching and rolling restart the report 'memoryusage' always shows that data usage is different on both nodes. I have two nodes and NoOfReplicas also 2. Does this mean I lose some data across rolling restarts ?

Log tells me that all is fine. However data usage can differ by 1-3% without changing the amount of memory across restarts.
[11 Oct 2016 23:39] MySQL Verification Team
The issue you was hit with is solved with the patch (bug closed), as for the difference in usage on data nodes, that's normal/expected

take care
Bogdan Kecman