Description:
In a multi-node NDB Cluster 9.3.0 setup with multiple SQL nodes, after a data node failure leading to a cluster failure, the SQL node error log shows:
Data node failure and cluster failure, immediately followed by:
Binlog: The util tables has been lost, restarting thread and the binlog thread restarts / waits for cluster to start.
After the restart sequence, the binlog thread reports it needs to reinstall util tables in DD and attempts to recreate them, e.g.:
Table 'mysql.ndb_schema' need reinstall in DD
Removing 'mysql.ndb_schema' from DD
Creating table 'mysql.ndb_schema'
Binlog: logging ./mysql/ndb_schema (UPDATED,USE_WRITE)
However, after the cluster is up, on every SQL node (ndb1..ndb4) the SQL layer only shows:
SHOW TABLES FROM mysql LIKE 'ndb_%' returns only ndb_apply_status and ndb_binlog_index
SHOW CREATE TABLE mysql.ndb_schema fails with: ERROR 1146 (42S02): Table 'mysql.ndb_schema' doesn't exist on all SQL nodes .
At the same time, NDB dictionary (via ndbinfo) indicates the util tables do exist, including mysql.ndb_schema, mysql.ndb_schema_result, mysql.ndb_sql_metadata (status Retrieved, etc.) .
Additionally, performance_schema.threads shows the NDB background threads are present (e.g., thread/ndbcluster/ndb_binlog), with ndb_binlog in state waiting for handler commit and ndb_purger closing tables.
This looks like a persistent inconsistency between SQL layer visibility / DD state and NDB dictionary state for the util tables after cluster failure/restart, and the automatic “reinstall/create” sequence does not restore mysql.ndb_schema to a usable state.
How to repeat:
Deploy an NDB Cluster 9.3.0 with multiple data nodes and multiple SQL nodes.
Start the cluster normally and ensure SQL nodes are connected.
Trigger a cluster failure (e.g., cause a data node failure; observe “Data node X failed” and “cluster failure at epoch …”).
Observe SQL node error log: Binlog: The util tables has been lost, restarting thread.
After recovery/restart, on each SQL node run:
SHOW TABLES FROM mysql LIKE 'ndb_%';
SHOW CREATE TABLE mysql.ndb_schema\G
Expected: util tables exist and are accessible.
Actual: mysql.ndb_schema is missing (ERROR 1146) on all SQL nodes.
Compare with NDB dictionary: query ndbinfo.dictionary_tables (or equivalent) and observe mysql.ndb_schema present while SQL layer cannot access it.
Suggested fix:
Improve the robustness of the util tables recovery path after cluster failure/restart: if ndb_binlog detects util tables lost and performs “reinstall in DD / create table”, it should verify that the SQL layer can open the table afterwards (not just log “Creating table”).
If verification fails (e.g., mysql.ndb_schema still 1146), retry with a stronger reconciliation strategy (e.g., fully dropping stale metadata/cache entries and re-creating util tables deterministically), and emit a clear diagnostic explaining why NDB dictionary and SQL layer diverged.