Description:
I observed an NDB DDL failure during an early recovery window after restarting data nodes in MySQL Cluster Community Server 9.3.0-cluster.
Environment:
- MySQL Cluster Community Server 9.3.0-cluster
- Linux / Docker-based test environment
- Per cluster:
- 1 management node
- 4 data nodes (ndbmtd)
- 4 SQL/API nodes (mysqld)
I originally noticed this while comparing two side-by-side clusters, but the core symptom does not depend on the differential setup. The relevant symptom is that CREATE TABLE ... ENGINE=NDBCLUSTER can fail during the early recovery window after restarting data nodes.
Configuration difference used in the captured run:
- baseline side: [ndbd default] TimeBetweenGlobalCheckpoints=2000
- mutated side: [ndbd default] TimeBetweenGlobalCheckpoints=20
What I did:
1. Started a healthy NDB cluster.
2. Created and populated NDB test tables.
3. Issued data-node restart commands from the management node:
- "2 RESTART" succeeded
- an immediate "3 RESTART" failed with:
5063 - Operation not allowed while nodes are starting or stopping
4. In the early recovery window after that restart activity, I executed a DDL-heavy SQL workload.
What I expected:
I expected CREATE TABLE ... ENGINE=NDBCLUSTER either:
- to succeed, or
- to fail with a more specific and clearly recovery-related error if the cluster was not yet ready for NDB DDL.
What actually happened:
The workload reached creation of the third table:
CREATE TABLE IF NOT EXISTS trial_case_000013_2 (...) ENGINE=NDBCLUSTER;
and MySQL returned on both captured sides:
ERROR 1005 (HY000): Can't create table 'trial_case_000013_2' (use SHOW WARNINGS for more info)
In the captured logs, table creation was attempted at:
- A side: 2026-06-06T11:27:12.991668Z
- B side: 2026-06-06T11:27:14.398943Z
Around the same recovery window, the SQL/API logs also showed node-failure / re-subscribe activity for data node 2:
- A side:
2026-06-06T11:27:32.941369Z Data node 2 failed
2026-06-06T11:27:52.840259Z Data node 2 reports subscribe ...
- B side:
2026-06-06T11:28:08.145189Z Data node 2 failed
2026-06-06T11:28:27.681265Z Data node 2 reports subscribe ...
So the reproducible symptom I want to report is:
during the early post-restart recovery window, CREATE TABLE ... ENGINE=NDBCLUSTER may fail with a generic ERROR 1005 instead of succeeding or returning a more specific recovery/readiness error.
How to repeat:
This issue appears to be timing-sensitive. The important part is to execute NDB DDL in the early recovery window immediately after restarting data nodes.
Setup:
- MySQL Cluster Community Server 9.3.0-cluster
- 1 x ndb_mgmd
- 4 x ndbmtd data nodes
- 4 x mysqld SQL/API nodes
Optional configuration used in the captured run:
[ndbd default]
TimeBetweenGlobalCheckpoints=20
Steps:
1. Start the cluster and wait until all data nodes are started:
ndb_mgm -e SHOW
2. Connect through one SQL/API node and create two NDB tables successfully, for example:
CREATE DATABASE IF NOT EXISTS depstatepp_bughunt;
USE depstatepp_bughunt;
CREATE TABLE IF NOT EXISTS trial_case_000013_0 (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k),
KEY idx_node_hint(node_hint),
KEY idx_payload(payload)
) ENGINE=NDBCLUSTER;
CREATE TABLE IF NOT EXISTS trial_case_000013_1 (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k),
KEY idx_node_hint(node_hint),
KEY idx_payload(payload)
) ENGINE=NDBCLUSTER;
Insert some rows into both tables.
3. From the management node, trigger restart activity on data nodes:
ndb_mgm -e "2 RESTART"
immediately followed by
ndb_mgm -e "3 RESTART"
In the captured run, the second command returned:
5063 - Operation not allowed while nodes are starting or stopping
4. Do not wait for a long full stabilization period.
Instead, in the early recovery window immediately after step 3, run:
CREATE TABLE IF NOT EXISTS trial_case_000013_2 (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k),
KEY idx_node_hint(node_hint),
KEY idx_payload(payload)
) ENGINE=NDBCLUSTER;
5. At the same time, collect:
- ndb_mgm -e SHOW
- mysqld error log
- any SHOW WARNINGS output immediately after the CREATE TABLE failure
Observed in the captured run:
- the CREATE TABLE attempt happened about 1-2 seconds before the node-failure / re-subscribe messages became visible in the SQL/API logs
- CREATE TABLE returned:
ERROR 1005 (HY000): Can't create table 'trial_case_000013_2' (use SHOW WARNINGS for more info)
Relevant captured timestamps:
- A side CREATE TABLE log:
2026-06-06T11:27:12.991668Z [NDB] Creating table 'depstatepp_bughunt.trial_case_000013_2'
- A side data node failure visible shortly after:
2026-06-06T11:27:32.941369Z Data node 2 failed
- B side CREATE TABLE log:
2026-06-06T11:27:14.398943Z [NDB] Creating table 'depstatepp_bughunt.trial_case_000013_2'
- B side data node failure visible shortly after:
2026-06-06T11:28:08.145189Z Data node 2 failed
The symptom to check is whether NDB DDL in this early recovery window fails with generic ERROR 1005.
Suggested fix:
Please check whether CREATE TABLE ... ENGINE=NDBCLUSTER is allowed to enter a partially unstable recovery window after data-node restart.
If the cluster is not yet ready for NDB DDL, it would be better either:
1) to block the DDL until the cluster is fully ready, or
2) to return a more specific error that clearly indicates recovery / restart state rather than generic ERROR 1005.
Description: I observed an NDB DDL failure during an early recovery window after restarting data nodes in MySQL Cluster Community Server 9.3.0-cluster. Environment: - MySQL Cluster Community Server 9.3.0-cluster - Linux / Docker-based test environment - Per cluster: - 1 management node - 4 data nodes (ndbmtd) - 4 SQL/API nodes (mysqld) I originally noticed this while comparing two side-by-side clusters, but the core symptom does not depend on the differential setup. The relevant symptom is that CREATE TABLE ... ENGINE=NDBCLUSTER can fail during the early recovery window after restarting data nodes. Configuration difference used in the captured run: - baseline side: [ndbd default] TimeBetweenGlobalCheckpoints=2000 - mutated side: [ndbd default] TimeBetweenGlobalCheckpoints=20 What I did: 1. Started a healthy NDB cluster. 2. Created and populated NDB test tables. 3. Issued data-node restart commands from the management node: - "2 RESTART" succeeded - an immediate "3 RESTART" failed with: 5063 - Operation not allowed while nodes are starting or stopping 4. In the early recovery window after that restart activity, I executed a DDL-heavy SQL workload. What I expected: I expected CREATE TABLE ... ENGINE=NDBCLUSTER either: - to succeed, or - to fail with a more specific and clearly recovery-related error if the cluster was not yet ready for NDB DDL. What actually happened: The workload reached creation of the third table: CREATE TABLE IF NOT EXISTS trial_case_000013_2 (...) ENGINE=NDBCLUSTER; and MySQL returned on both captured sides: ERROR 1005 (HY000): Can't create table 'trial_case_000013_2' (use SHOW WARNINGS for more info) In the captured logs, table creation was attempted at: - A side: 2026-06-06T11:27:12.991668Z - B side: 2026-06-06T11:27:14.398943Z Around the same recovery window, the SQL/API logs also showed node-failure / re-subscribe activity for data node 2: - A side: 2026-06-06T11:27:32.941369Z Data node 2 failed 2026-06-06T11:27:52.840259Z Data node 2 reports subscribe ... - B side: 2026-06-06T11:28:08.145189Z Data node 2 failed 2026-06-06T11:28:27.681265Z Data node 2 reports subscribe ... So the reproducible symptom I want to report is: during the early post-restart recovery window, CREATE TABLE ... ENGINE=NDBCLUSTER may fail with a generic ERROR 1005 instead of succeeding or returning a more specific recovery/readiness error. How to repeat: This issue appears to be timing-sensitive. The important part is to execute NDB DDL in the early recovery window immediately after restarting data nodes. Setup: - MySQL Cluster Community Server 9.3.0-cluster - 1 x ndb_mgmd - 4 x ndbmtd data nodes - 4 x mysqld SQL/API nodes Optional configuration used in the captured run: [ndbd default] TimeBetweenGlobalCheckpoints=20 Steps: 1. Start the cluster and wait until all data nodes are started: ndb_mgm -e SHOW 2. Connect through one SQL/API node and create two NDB tables successfully, for example: CREATE DATABASE IF NOT EXISTS depstatepp_bughunt; USE depstatepp_bughunt; CREATE TABLE IF NOT EXISTS trial_case_000013_0 ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k), KEY idx_node_hint(node_hint), KEY idx_payload(payload) ) ENGINE=NDBCLUSTER; CREATE TABLE IF NOT EXISTS trial_case_000013_1 ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k), KEY idx_node_hint(node_hint), KEY idx_payload(payload) ) ENGINE=NDBCLUSTER; Insert some rows into both tables. 3. From the management node, trigger restart activity on data nodes: ndb_mgm -e "2 RESTART" immediately followed by ndb_mgm -e "3 RESTART" In the captured run, the second command returned: 5063 - Operation not allowed while nodes are starting or stopping 4. Do not wait for a long full stabilization period. Instead, in the early recovery window immediately after step 3, run: CREATE TABLE IF NOT EXISTS trial_case_000013_2 ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k), KEY idx_node_hint(node_hint), KEY idx_payload(payload) ) ENGINE=NDBCLUSTER; 5. At the same time, collect: - ndb_mgm -e SHOW - mysqld error log - any SHOW WARNINGS output immediately after the CREATE TABLE failure Observed in the captured run: - the CREATE TABLE attempt happened about 1-2 seconds before the node-failure / re-subscribe messages became visible in the SQL/API logs - CREATE TABLE returned: ERROR 1005 (HY000): Can't create table 'trial_case_000013_2' (use SHOW WARNINGS for more info) Relevant captured timestamps: - A side CREATE TABLE log: 2026-06-06T11:27:12.991668Z [NDB] Creating table 'depstatepp_bughunt.trial_case_000013_2' - A side data node failure visible shortly after: 2026-06-06T11:27:32.941369Z Data node 2 failed - B side CREATE TABLE log: 2026-06-06T11:27:14.398943Z [NDB] Creating table 'depstatepp_bughunt.trial_case_000013_2' - B side data node failure visible shortly after: 2026-06-06T11:28:08.145189Z Data node 2 failed The symptom to check is whether NDB DDL in this early recovery window fails with generic ERROR 1005. Suggested fix: Please check whether CREATE TABLE ... ENGINE=NDBCLUSTER is allowed to enter a partially unstable recovery window after data-node restart. If the cluster is not yet ready for NDB DDL, it would be better either: 1) to block the DDL until the cluster is fully ready, or 2) to return a more specific error that clearly indicates recovery / restart state rather than generic ERROR 1005.