MySQL Bugs: #62177: restart data node after MaxNoOfConcurrentOperations/MaxNoOfLocalOperation change

Bug #62177	restart data node after MaxNoOfConcurrentOperations/MaxNoOfLocalOperation change
Submitted:	16 Aug 2011 11:54	Modified:	11 Oct 2016 23:39
Reporter:	Eugene Zheganin	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.1.15	OS:	Linux (Debian/Squeeze amd64)
Assigned to:		CPU Architecture:	Any

Description:
After canging config.ini values from

#MaxNoOfConcurrentOperations=131072
#MaxNoOfLocalOperations=144180
to:
MaxNoOfConcurrentOperations=262144
MaxNoOfLocalOperations=288358
MaxNoOfConcurrentIndexOperations=32768

This configuration change was made in order to delete a bunch of records from a  table with more than 20M records. The DML was terminated with message: 

ERROR 1297 (HY000): Got temporary error 233 'Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations)' from NDBCLUSTER

I shut down the ndb_mgm, started it with --reload, and shut down first node (living on the same server as ndb_mgmd).

It did not start back.  The error was

2011-08-16 15:32:51 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Occured during startphase 5. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2011-08-16 15:32:51 [MgmtSrvr] ALERT    -- Node 1: Node 2 Disconnected

The thing is that reverting back to the original values didn't help either.
The error is quite repeatable. To be honest, I lost any hope to start this node back. At this time the cluster is running on one single node.

How to repeat:
I seriously doubt that this can be repeated without actual data, but I can attach the config.ini file and the traces.

ndb_error_report stuff

Attachment: ndb_error_report_20110816154315.tar.bz2 (application/x-bzip, text), 619 bytes.

cluster log, taken after ndb_mdmd start and ndbd start

Attachment: ndb_1_cluster.log.bz2 (application/x-bzip, text), 2.52 KiB.

cluster error log, taken accordingly

Attachment: ndb_2_error.log.bz2 (application/x-bzip, text), 665 bytes.

cluster output log

Attachment: ndb_2_out.log.bz2 (application/x-bzip, text), 15.26 KiB.

main trace log, containing last ndbd start with --initial parameter

Attachment: ndb_2_trace.log.8.bz2 (application/x-bzip, text), 51.41 KiB.

I did also try to start data node with --initial parameter, I saw no difference except some minor log changes.
Node crashed as usually.

Hi,

I think (pretty sure) that you (also) found Bug#62116
which by now can be considered one of the worst bugs we have had
in a few years :-(

If you compile yourself, you can try the patch.
Otherwise are we working on new binaries.
Which I strongly suggest changing to...

/Jonas

Thanks, I rebuilded from sources. Patch did help.

But after patching and rolling restart the report 'memoryusage' always shows that data usage is different on both nodes. I have two nodes and NoOfReplicas also 2. Does this mean I lose some data across rolling restarts ?

Log tells me that all is fine. However data usage can differ by 1-3% without changing the amount of memory across restarts.

The issue you was hit with is solved with the patch (bug closed), as for the difference in usage on data nodes, that's normal/expected

take care
Bogdan Kecman