MySQL Bugs: #119785: After cluster restart, mysql.ndb_schema creation fails on SQL nodes (ERROR 1146) but NDB dictionary shows it exists.

Bug #119785	After cluster restart, mysql.ndb_schema creation fails on SQL nodes (ERROR 1146) but NDB dictionary shows it exists.
Submitted:	27 Jan 1:52
Reporter:	CunDi Fang	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Cluster: Cluster/J	Severity:	S3 (Non-critical)
Version:	MySQL NDB Cluster 9.3.0-cluster	OS:	Ubuntu (22.04)
Assigned to:		CPU Architecture:	Any
Tags:	cluster failure, data dictionary, metadata sync, ndb, ndb_binlog, ndb_schema, restart, util tables

Description:
In a multi-node NDB Cluster 9.3.0 setup with multiple SQL nodes, after a data node failure leading to a cluster failure, the SQL node error log shows:

Data node failure and cluster failure, immediately followed by:
Binlog: The util tables has been lost, restarting thread and the binlog thread restarts / waits for cluster to start.

After the restart sequence, the binlog thread reports it needs to reinstall util tables in DD and attempts to recreate them, e.g.:

Table 'mysql.ndb_schema' need reinstall in DD

Removing 'mysql.ndb_schema' from DD

Creating table 'mysql.ndb_schema'

Binlog: logging ./mysql/ndb_schema (UPDATED,USE_WRITE) 

However, after the cluster is up, on every SQL node (ndb1..ndb4) the SQL layer only shows:

SHOW TABLES FROM mysql LIKE 'ndb_%' returns only ndb_apply_status and ndb_binlog_index 

SHOW CREATE TABLE mysql.ndb_schema fails with: ERROR 1146 (42S02): Table 'mysql.ndb_schema' doesn't exist on all SQL nodes .

At the same time, NDB dictionary (via ndbinfo) indicates the util tables do exist, including mysql.ndb_schema, mysql.ndb_schema_result, mysql.ndb_sql_metadata (status Retrieved, etc.) .

Additionally, performance_schema.threads shows the NDB background threads are present (e.g., thread/ndbcluster/ndb_binlog), with ndb_binlog in state waiting for handler commit and ndb_purger closing tables.

This looks like a persistent inconsistency between SQL layer visibility / DD state and NDB dictionary state for the util tables after cluster failure/restart, and the automatic “reinstall/create” sequence does not restore mysql.ndb_schema to a usable state.

How to repeat:
Deploy an NDB Cluster 9.3.0 with multiple data nodes and multiple SQL nodes.

Start the cluster normally and ensure SQL nodes are connected.

Trigger a cluster failure (e.g., cause a data node failure; observe “Data node X failed” and “cluster failure at epoch …”). 

Observe SQL node error log: Binlog: The util tables has been lost, restarting thread. 

After recovery/restart, on each SQL node run:

SHOW TABLES FROM mysql LIKE 'ndb_%';

SHOW CREATE TABLE mysql.ndb_schema\G
Expected: util tables exist and are accessible.
Actual: mysql.ndb_schema is missing (ERROR 1146) on all SQL nodes. 

Compare with NDB dictionary: query ndbinfo.dictionary_tables (or equivalent) and observe mysql.ndb_schema present while SQL layer cannot access it.

Suggested fix:
Improve the robustness of the util tables recovery path after cluster failure/restart: if ndb_binlog detects util tables lost and performs “reinstall in DD / create table”, it should verify that the SQL layer can open the table afterwards (not just log “Creating table”).

If verification fails (e.g., mysql.ndb_schema still 1146), retry with a stronger reconciliation strategy (e.g., fully dropping stale metadata/cache entries and re-creating util tables deterministically), and emit a clear diagnostic explaining why NDB dictionary and SQL layer diverged.