Bug #118337 MySQL stuck during restart after restoring backup on NDB data nodes
Submitted: 3 Jun 9:26 Modified: 11 Jun 16:16
Reporter: ZHAO SONG Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version: OS:Any
Assigned to: CPU Architecture:Any

[3 Jun 9:26] ZHAO SONG
Description:
I followed these steps to restore a backup on an NDB cluster:

1. Shutdown mysqld.

2. Restart all data nodes with --initial and restore the backup on these nodes.

3. Restart mysqld.

After that, MySQL gets stuck during the restart and eventually times out with this error:

[ERROR] [MY-010865] [NDB] Tables not available after 120 seconds. Consider increasing --ndb-wait-setup value

I looked into the code and found that mysqld is stuck in wait_setup_completed(), waiting for ndb_index_stat_thread.is_setup_complete() to return true.
At the same time, ndb_index_stat_thread keeps failing and retrying in Ndb_index_stat_thread::do_run() because check_sysevents() fails.

The root cause of the check_sysevents() failure is that it can’t find the event NDB_INDEX_STAT_HEAD_EVENT.

I dug deeper into how this event is created:

On MySQL startup:
Ndb_binlog_setup::setup() does:

Check if mysql.ndb_index_stat_head exists:

 * If not, create the table and the event NDB_INDEX_STAT_HEAD_EVENT.

 * If yes, skip handling both the table and the event.

On ndb_restore:
BackupRestore::handle_index_stat_tables does:

Check if mysql.ndb_index_stat_head exists:

 * If not, create only the table (but not the event).

 * If yes, skip.

In this case, during restore, ndb_restore creates the table mysql.ndb_index_stat_head but does not create the event NDB_INDEX_STAT_HEAD_EVENT.
When mysqld restarts, it sees that the table already exists and skips creating both the table and the event.

As a result, the event is never created, causing mysqld to get stuck waiting for it to appear.

How to repeat:
1. Shutdown mysqld.

2. Restart all data nodes with --initial and restore the backup on these nodes.

3. Restart mysqld.

Suggested fix:
It can be worked around by applying specific restore steps.
But I think it should also be addressed at a lower level to ensure consistency and prevent this issue altogether.
[11 Jun 16:16] MySQL Verification Team
Thank you for the analysis