MySQL Bugs: #118337: MySQL stuck during restart after restoring backup on NDB data nodes

Bug #118337	MySQL stuck during restart after restoring backup on NDB data nodes
Submitted:	3 Jun 9:26	Modified:	11 Jun 16:16
Reporter:	ZHAO SONG	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
I followed these steps to restore a backup on an NDB cluster:

1. Shutdown mysqld.

2. Restart all data nodes with --initial and restore the backup on these nodes.

3. Restart mysqld.

After that, MySQL gets stuck during the restart and eventually times out with this error:

[ERROR] [MY-010865] [NDB] Tables not available after 120 seconds. Consider increasing --ndb-wait-setup value

I looked into the code and found that mysqld is stuck in wait_setup_completed(), waiting for ndb_index_stat_thread.is_setup_complete() to return true.
At the same time, ndb_index_stat_thread keeps failing and retrying in Ndb_index_stat_thread::do_run() because check_sysevents() fails.

The root cause of the check_sysevents() failure is that it can’t find the event NDB_INDEX_STAT_HEAD_EVENT.

I dug deeper into how this event is created:

On MySQL startup:
Ndb_binlog_setup::setup() does:

Check if mysql.ndb_index_stat_head exists:

* If not, create the table and the event NDB_INDEX_STAT_HEAD_EVENT.

* If yes, skip handling both the table and the event.

On ndb_restore:
BackupRestore::handle_index_stat_tables does:

Check if mysql.ndb_index_stat_head exists:

* If not, create only the table (but not the event).

* If yes, skip.

In this case, during restore, ndb_restore creates the table mysql.ndb_index_stat_head but does not create the event NDB_INDEX_STAT_HEAD_EVENT.
When mysqld restarts, it sees that the table already exists and skips creating both the table and the event.

As a result, the event is never created, causing mysqld to get stuck waiting for it to appear.

How to repeat:
1. Shutdown mysqld.

2. Restart all data nodes with --initial and restore the backup on these nodes.

3. Restart mysqld.

Suggested fix:
It can be worked around by applying specific restore steps.
But I think it should also be addressed at a lower level to ensure consistency and prevent this issue altogether.

Thank you for the analysis