Bug #106588 Data nodes crashed with error 2341 (DBTC Check false failed)
Submitted: 27 Feb 2022 19:16 Modified: 1 Mar 2022 12:55
Reporter: Shawn Hogan Email Updates:
Status: Can't repeat Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.6.17 OS:SUSE
Assigned to: MySQL Verification Team CPU Architecture:x86

[27 Feb 2022 19:16] Shawn Hogan
Had 3 (out of 8) data nodes fail with:

Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 1830) 0x00000002 Check false failed

The cluster remained functional (kind of)... there were enough data nodes to form a valid cluster, but the SQL nodes started getting a lot of: "MySQL fetch error [2014]: Commands out of sync; you can't run this command now"

Restarted one of the failed data nodes and at the very end, it caused other data nodes to crash, which caused catastrophic cluster failure (all data nodes came down).

Currently attempting to restart all 8 data nodes now, so figured I would file this bug report while I'm waiting to see if they come up successfully.  :(

How to repeat:
Not sure.

Suggested fix:
[27 Feb 2022 19:33] Shawn Hogan
An NDB error report was uploaded to:

[27 Feb 2022 19:49] Shawn Hogan
The running node that failed while the failed node was coming online, failed with this:

Time: Sunday 27 February 2022 - 10:50:27
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: Invalid memory access: ptr (80059d8e 0x7f9d9fda7648) magic: (00000040 00000068) memroot: 0x7f9b9fc40000 page: 68
Error object: DBSPJ (Line: 80) 0x00000002
Program: ndbmtd
Pid: 19795 thr: 0
Version: mysql-5.7.33 ndb-7.6.17
Trace file name: ndb_17_trace.log.2
Trace file path: /var/mysql-cluster/ndb_17_trace.log.2 [t1..t7]
[1 Mar 2022 12:55] MySQL Verification Team
Hi Shawn,

I cannot reproduce this. Logs do not give enough data to find out what happened exactly. We could do code review but since you are not using latest version that does not make too much sense. My advice would be to upgrade to 7.6.21 and let us know if this happens again.

all best