Description:
During a cluster instability window (multiple data nodes disconnected/failed, cluster restarted repeatedly), all SQL nodes (mysqld) emitted repeated errors of the form:
[NDB] NdbInfo::openTable failed for ./ndbinfo/...
This was seen for multiple ndbinfo tables, including (examples observed in logs):
./ndbinfo/dictionary_tables
./ndbinfo/dictionary_columns
./ndbinfo/ndb@xxxxconfig_nodes
./ndbinfo/ndb@xxxxconfig_params
./ndbinfo/ndb@xxxxconfig_values
Within a 5-minute time window, each mysqld node logged the same number of occurrences, suggesting a periodic retry loop without effective backoff and/or limited diagnostics in the error message. After the cluster recovered, ndbinfo queries work normally again.
Evidence / data:
Cluster-side validation that ndbinfo is accessible after recovery (executed on SQL node ndb3):
SELECT NOW();
SELECT COUNT(*) FROM ndbinfo.dictionary_tables;
SELECT * FROM ndbinfo.config_nodes LIMIT 3;
Output:
NOW()
2026-01-27 02:02:06
COUNT(*)
15
node_id node_type node_hostname
1 MGM ndb-mgmd
2 NDB ndb1
3 NDB ndb2
Count of NdbInfo::openTable failed messages during the unstable window
Time window: 2026-01-22T09:50:30Z ~ 2026-01-22T09:55:30Z
Command and output:
for c in ndb1 ndb2 ndb3 ndb4; do
echo "== $c =="
docker logs $c --since '2026-01-22T09:50:30Z' --until '2026-01-22T09:55:30Z' 2>/dev/null \
| grep -c "NdbInfo::openTable failed" || true
done
== ndb1 ==
15
== ndb2 ==
15
== ndb3 ==
15
== ndb4 ==
15
Probe result confirming ndbinfo.dictionary_tables is stable when the cluster is healthy
(1 Hz sampling; representative excerpt):
for i in $(seq 1 120); do
date -Ins
docker exec -i ndb3 bash -lc 'mysql -uroot -e "SELECT COUNT(*) FROM ndbinfo.dictionary_tables;"' \
|| echo "QUERY_FAILED"
sleep 1
done | tee /tmp/ndb3_ndbinfo_probe.log
Excerpt:
2026-01-27T02:05:54,851310305+00:00
COUNT(*)
15
2026-01-27T02:05:56,006167205+00:00
COUNT(*)
15
...
2026-01-27T02:06:01,693725881+00:00
COUNT(*)
15
How to repeat:
Deploy an NDB Cluster with multiple data nodes and multiple SQL nodes (mysqld) connected to the same management node (ndb_mgmd).
Induce a failure/restart loop where multiple data nodes disconnect/fail and the cluster experiences repeated restarts (e.g., network disruption or killing/restarting multiple data nodes).
While the cluster is unstable and restarting, observe mysqld logs. The following message repeats multiple times for ndbinfo tables:
NdbInfo::openTable failed for ./ndbinfo/...
After the cluster stabilizes, verify that ndbinfo queries are normal again (e.g., SELECT COUNT(*) FROM ndbinfo.dictionary_tables; returns a stable value).
Suggested fix:
If temporary ndbinfo unavailability during cluster restart is expected, reduce log spam and improve diagnosability by:
adding exponential backoff / rate limiting for repeated openTable failures, and/or
including the underlying NdbApi error code/reason in the log message to clarify whether the failure is due to cluster not ready, dictionary unavailable, or another cause.
If it is not expected, please advise whether ndbinfo should remain accessible (or become accessible earlier) during recovery and which additional diagnostics to collect.
Description: During a cluster instability window (multiple data nodes disconnected/failed, cluster restarted repeatedly), all SQL nodes (mysqld) emitted repeated errors of the form: [NDB] NdbInfo::openTable failed for ./ndbinfo/... This was seen for multiple ndbinfo tables, including (examples observed in logs): ./ndbinfo/dictionary_tables ./ndbinfo/dictionary_columns ./ndbinfo/ndb@xxxxconfig_nodes ./ndbinfo/ndb@xxxxconfig_params ./ndbinfo/ndb@xxxxconfig_values Within a 5-minute time window, each mysqld node logged the same number of occurrences, suggesting a periodic retry loop without effective backoff and/or limited diagnostics in the error message. After the cluster recovered, ndbinfo queries work normally again. Evidence / data: Cluster-side validation that ndbinfo is accessible after recovery (executed on SQL node ndb3): SELECT NOW(); SELECT COUNT(*) FROM ndbinfo.dictionary_tables; SELECT * FROM ndbinfo.config_nodes LIMIT 3; Output: NOW() 2026-01-27 02:02:06 COUNT(*) 15 node_id node_type node_hostname 1 MGM ndb-mgmd 2 NDB ndb1 3 NDB ndb2 Count of NdbInfo::openTable failed messages during the unstable window Time window: 2026-01-22T09:50:30Z ~ 2026-01-22T09:55:30Z Command and output: for c in ndb1 ndb2 ndb3 ndb4; do echo "== $c ==" docker logs $c --since '2026-01-22T09:50:30Z' --until '2026-01-22T09:55:30Z' 2>/dev/null \ | grep -c "NdbInfo::openTable failed" || true done == ndb1 == 15 == ndb2 == 15 == ndb3 == 15 == ndb4 == 15 Probe result confirming ndbinfo.dictionary_tables is stable when the cluster is healthy (1 Hz sampling; representative excerpt): for i in $(seq 1 120); do date -Ins docker exec -i ndb3 bash -lc 'mysql -uroot -e "SELECT COUNT(*) FROM ndbinfo.dictionary_tables;"' \ || echo "QUERY_FAILED" sleep 1 done | tee /tmp/ndb3_ndbinfo_probe.log Excerpt: 2026-01-27T02:05:54,851310305+00:00 COUNT(*) 15 2026-01-27T02:05:56,006167205+00:00 COUNT(*) 15 ... 2026-01-27T02:06:01,693725881+00:00 COUNT(*) 15 How to repeat: Deploy an NDB Cluster with multiple data nodes and multiple SQL nodes (mysqld) connected to the same management node (ndb_mgmd). Induce a failure/restart loop where multiple data nodes disconnect/fail and the cluster experiences repeated restarts (e.g., network disruption or killing/restarting multiple data nodes). While the cluster is unstable and restarting, observe mysqld logs. The following message repeats multiple times for ndbinfo tables: NdbInfo::openTable failed for ./ndbinfo/... After the cluster stabilizes, verify that ndbinfo queries are normal again (e.g., SELECT COUNT(*) FROM ndbinfo.dictionary_tables; returns a stable value). Suggested fix: If temporary ndbinfo unavailability during cluster restart is expected, reduce log spam and improve diagnosability by: adding exponential backoff / rate limiting for repeated openTable failures, and/or including the underlying NdbApi error code/reason in the log message to clarify whether the failure is due to cluster not ready, dictionary unavailable, or another cause. If it is not expected, please advise whether ndbinfo should remain accessible (or become accessible earlier) during recovery and which additional diagnostics to collect.