MySQL Bugs: #62188: data nodes chaincrash while extensive deleting

Bug #62188	data nodes chaincrash while extensive deleting
Submitted:	17 Aug 2011 5:10	Modified:	19 Oct 2016 22:59
Reporter:	Eugene Zheganin	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.1.15	OS:	Linux (Debian/Squeeze amd64)
Assigned to:		CPU Architecture:	Any

Description:
When issuing delete DMLs, deleting 80K-500K statements, one of the data nodes gone down, then, in a minute, another crashed. The cluster was also loaded from web-clients, they were ussuing inserts/selects. I'm only talking about my own activity in mysql console. However, this cluster is working under the same load (except my delete statements) for days, so it's more probably that these delete DMLs were the cause of the crash.

Cluster configuration: 
2 servers, running 2 data, 2 SQL and one MGM node.

Ndb_mgm console log attached.
Mysql console log attached.
Traces, outs, errors, and other various staff attached.

Timestamps in logs:

first data node crash occured at:

2011-08-16 22:48:33 [MgmtSrvr] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

second and the last one:

2011-08-16 22:52:59 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

How to repeat:
I don't know if it's repeatable. I hope not. Seeing lots 'please report a bug' in the forum about 'please report a bug' messages I decided to report a bug. Hope this would probably help.

Bunch of files needed to investigate the bug

Attachment: chaincrash.tar.gz (application/x-gzip, text), 425.22 KiB.

Hi

This is a duplicate of Bug#62116,
we have fixed it, and are in the progress of releasing a new version.
There is a patched attached to bug#62116, if you build yourself.

/Jonas

Unfortunately, data nodes have a time gap between em. However, they are on the same gigabit LAN.
But as I can see they report the time from MGM node, as the time is identical in their logs.

Oh. Thanks a lot.

But... I had this patch applied when this crash happened.
The patch definitely helped, because without the patch I cannot even start data nodes.

I have patch applied on both servers.

hmm...sorry...

I just found that bug in the error.log, and stopped looking...

setting back to open

/Jonas

Yup, must be some earlier startup just before the patch and rebuild.

Sorry for such logs, but I know that developers don't like edited log files.

Seems like it's reproduceable.
Got it for the second time.

- can't get the log any more (too old issue)to be 100% sure 
- looks like a gcp stop crash
- these type of crashes were easily reproducible in old versions
- it's "doing a huge transaction" all at once (delete from table a lot of records at once for e.g.)

- all this issues are solved by
 * upgrading to latest version of any GA branch
 * properly configuring cluster