MySQL Bugs: #91161: NDB crash on Check createTableVersion == lqhCreateTableVersion || lqhCreateTable

Bug #91161	NDB crash on Check createTableVersion == lqhCreateTableVersion \|\| lqhCreateTable
Submitted:	6 Jun 2018 14:31	Modified:	13 Jun 2018 3:45
Reporter:	Daniël van Eeden (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	7.6.6	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
2018-06-06 16:16:55 [ndbd] INFO     -- /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/blocks/backup/Ba
ckup.cpp
2018-06-06 16:16:55 [ndbd] INFO     -- BACKUP (Line: 11209) 0x00000002 Check createTableVersion == lqhCreateTableVersion || lqhCreateTableVersion == 0 failed
2018-06-06 16:16:55 [ndbd] INFO     -- Error handler shutting down system
2018-06-06 16:16:56 [ndbd] INFO     -- Error handler shutdown completed - aborting
2018-06-06 16:17:10 [ndbd] INFO     -- Child process terminated by signal 6
2018-06-06 16:17:10 [ndbd] ALERT    -- Node 26: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, pro
gramming error or missing error message, please report a bug). Temporary error, restart node'.

Time: Wednesday 6 June 2018 - 16:16:55
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Backup.cpp
Error object: BACKUP (Line: 11209) 0x00000002 Check createTableVersion == lqhCreateTableVersion || lqhCreateTableVersion == 0 failed
Program: ndbmtd
Pid: 49603 thr: 7
Version: mysql-5.7.22 ndb-7.6.6
Trace file name: ndb_26_trace.log.18_t7
Trace file path: /var/lib/mysql-cluster/ndb_26_trace.log.18 [t1..t17]
***EOM***

How to repeat:
Unknown yet. Happened on startup of a cluster for a 7.6.4 to 7.6.6 upgrade.

Hi Daniel,

> Happened on startup of a cluster for a 7.6.4 to 7.6.6 upgrade.

How was the upgrade performed? This looks like it happened during "START BACKUP" and not during upgrade?

all best
Bogdan

I started to do a rolling upgrade.

From memory:
I started with node 48, which went ok.
Then I did node 47, which crashed (maybe Bug #90606)
Then I did node 46, which went ok.
Then replication to the cluster stopped (see Bug #91160).
Then I did a full cluster shutdown and start. I think that was when this crash happened on node 26.

I think I had to start node 47 and one other node (26?) with --initial to get the cluster 100% healty again.

Hi,
Thanks for the explanation. The new code have some additional logging so if this happens again (with newer code) we'll have more info to go on, but with info we have now there's nothing we can do to find out why it happened and I can't reproduce this using your flow.

all best
Bogdan

This can happen in the following scenario:
1) Node X stops
2) Drop table
3) Cluster restart including start of node X
4) Create table
5) Crash

The reason is that the Drop table isn't performed as part of the cluster restart in node X
and this leaves LCP files from the old table remaining and when this table id is reused by
a new table we discover an anomaly in the LCP execution code.

We will do two solutions to this problem.

The short-term solution is to simply not crash in this situation, but simply remove the
old files and create new ones.

The long-term solution is to ensure that cluster restarts cleans up properly old dropped tables
even when the node was down while the drop table occurred.