Description:
I followed these steps to restore a backup on an NDB cluster:
1. Shutdown mysqld.
2. Restart all data nodes with --initial and restore the backup on these nodes.
3. Restart mysqld.
After that, MySQL gets stuck during the restart and eventually times out with this error:
[ERROR] [MY-010865] [NDB] Tables not available after 120 seconds. Consider increasing --ndb-wait-setup value
I looked into the code and found that mysqld is stuck in wait_setup_completed(), waiting for ndb_index_stat_thread.is_setup_complete() to return true.
At the same time, ndb_index_stat_thread keeps failing and retrying in Ndb_index_stat_thread::do_run() because check_sysevents() fails.
The root cause of the check_sysevents() failure is that it can’t find the event NDB_INDEX_STAT_HEAD_EVENT.
I dug deeper into how this event is created:
On MySQL startup:
Ndb_binlog_setup::setup() does:
Check if mysql.ndb_index_stat_head exists:
* If not, create the table and the event NDB_INDEX_STAT_HEAD_EVENT.
* If yes, skip handling both the table and the event.
On ndb_restore:
BackupRestore::handle_index_stat_tables does:
Check if mysql.ndb_index_stat_head exists:
* If not, create only the table (but not the event).
* If yes, skip.
In this case, during restore, ndb_restore creates the table mysql.ndb_index_stat_head but does not create the event NDB_INDEX_STAT_HEAD_EVENT.
When mysqld restarts, it sees that the table already exists and skips creating both the table and the event.
As a result, the event is never created, causing mysqld to get stuck waiting for it to appear.
How to repeat:
1. Shutdown mysqld.
2. Restart all data nodes with --initial and restore the backup on these nodes.
3. Restart mysqld.
Suggested fix:
It can be worked around by applying specific restore steps.
But I think it should also be addressed at a lower level to ensure consistency and prevent this issue altogether.