Bug #62188 data nodes chaincrash while extensive deleting
Submitted: 17 Aug 2011 5:10 Modified: 19 Oct 2016 22:59
Reporter: Eugene Zheganin Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.1.15 OS:Linux (Debian/Squeeze amd64)
Assigned to: CPU Architecture:Any

[17 Aug 2011 5:10] Eugene Zheganin
Description:
When issuing delete DMLs, deleting 80K-500K statements, one of the data nodes gone down, then, in a minute, another crashed. The cluster was also loaded from web-clients, they were ussuing inserts/selects. I'm only talking about my own activity in mysql console. However, this cluster is working under the same load (except my delete statements) for days, so it's more probably that these delete DMLs were the cause of the crash.

Cluster configuration: 
2 servers, running 2 data, 2 SQL and one MGM node.

Ndb_mgm console log attached.
Mysql console log attached.
Traces, outs, errors, and other various staff attached.

Timestamps in logs:

first data node crash occured at:

2011-08-16 22:48:33 [MgmtSrvr] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

second and the last one:

2011-08-16 22:52:59 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

How to repeat:
I don't know if it's repeatable. I hope not. Seeing lots 'please report a bug' in the forum about 'please report a bug' messages I decided to report a bug. Hope this would probably help.
[17 Aug 2011 5:11] Eugene Zheganin
Bunch of files needed to investigate the bug

Attachment: chaincrash.tar.gz (application/x-gzip, text), 425.22 KiB.

[17 Aug 2011 5:13] Jonas Oreland
Hi

This is a duplicate of Bug#62116,
we have fixed it, and are in the progress of releasing a new version.
There is a patched attached to bug#62116, if you build yourself.

/Jonas
[17 Aug 2011 5:16] Eugene Zheganin
Unfortunately, data nodes have a time gap between em. However, they are on the same gigabit LAN.
But as I can see they report the time from MGM node, as the time is identical in their logs.
[17 Aug 2011 5:16] Eugene Zheganin
Oh. Thanks a lot.
[17 Aug 2011 5:19] Eugene Zheganin
But... I had this patch applied when this crash happened.
The patch definitely helped, because without the patch I cannot even start data nodes.

I have patch applied on both servers.
[17 Aug 2011 6:09] Jonas Oreland
hmm...sorry...

I just found that bug in the error.log, and stopped looking...

setting back to open

/Jonas
[17 Aug 2011 6:36] Eugene Zheganin
Yup, must be some earlier startup just before the patch and rebuild.

Sorry for such logs, but I know that developers don't like edited log files.
[22 Aug 2011 11:06] Eugene Zheganin
Seems like it's reproduceable.
Got it for the second time.
[19 Oct 2016 22:59] MySQL Verification Team
- can't get the log any more (too old issue)to be 100% sure 
- looks like a gcp stop crash
- these type of crashes were easily reproducible in old versions
- it's "doing a huge transaction" all at once (delete from table a lot of records at once for e.g.)

- all this issues are solved by
 * upgrading to latest version of any GA branch
 * properly configuring cluster