MySQL Bugs: #120627: CREATE TABLE may fail during early NDB recovery

Bug #120627	CREATE TABLE may fail during early NDB recovery
Submitted:	8 Jun 7:59	Modified:	26 Jun 8:38
Reporter:	cundi fang	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	9.3.0	OS:	Ubuntu (22.04)
Assigned to:		CPU Architecture:	Any
Tags:	create-table, error-1005, ndb, recovery, restart, TimeBetweenGlobalCheckpoints

Description:
I observed an NDB DDL failure during an early recovery window after restarting data nodes in MySQL Cluster Community Server 9.3.0-cluster.

Environment:
- MySQL Cluster Community Server 9.3.0-cluster
- Linux / Docker-based test environment
- Per cluster:
  - 1 management node
  - 4 data nodes (ndbmtd)
  - 4 SQL/API nodes (mysqld)

I originally noticed this while comparing two side-by-side clusters, but the core symptom does not depend on the differential setup. The relevant symptom is that CREATE TABLE ... ENGINE=NDBCLUSTER can fail during the early recovery window after restarting data nodes.

Configuration difference used in the captured run:
- baseline side: [ndbd default] TimeBetweenGlobalCheckpoints=2000
- mutated side: [ndbd default] TimeBetweenGlobalCheckpoints=20

What I did:
1. Started a healthy NDB cluster.
2. Created and populated NDB test tables.
3. Issued data-node restart commands from the management node:
   - "2 RESTART" succeeded
   - an immediate "3 RESTART" failed with:
     5063 - Operation not allowed while nodes are starting or stopping
4. In the early recovery window after that restart activity, I executed a DDL-heavy SQL workload.

What I expected:
I expected CREATE TABLE ... ENGINE=NDBCLUSTER either:
- to succeed, or
- to fail with a more specific and clearly recovery-related error if the cluster was not yet ready for NDB DDL.

What actually happened:
The workload reached creation of the third table:
  CREATE TABLE IF NOT EXISTS trial_case_000013_2 (...) ENGINE=NDBCLUSTER;

and MySQL returned on both captured sides:
  ERROR 1005 (HY000): Can't create table 'trial_case_000013_2' (use SHOW WARNINGS for more info)

In the captured logs, table creation was attempted at:
- A side: 2026-06-06T11:27:12.991668Z
- B side: 2026-06-06T11:27:14.398943Z

Around the same recovery window, the SQL/API logs also showed node-failure / re-subscribe activity for data node 2:
- A side:
  2026-06-06T11:27:32.941369Z  Data node 2 failed
  2026-06-06T11:27:52.840259Z  Data node 2 reports subscribe ...
- B side:
  2026-06-06T11:28:08.145189Z  Data node 2 failed
  2026-06-06T11:28:27.681265Z  Data node 2 reports subscribe ...

So the reproducible symptom I want to report is:
during the early post-restart recovery window, CREATE TABLE ... ENGINE=NDBCLUSTER may fail with a generic ERROR 1005 instead of succeeding or returning a more specific recovery/readiness error.

How to repeat:
This issue appears to be timing-sensitive. The important part is to execute NDB DDL in the early recovery window immediately after restarting data nodes.

Setup:
- MySQL Cluster Community Server 9.3.0-cluster
- 1 x ndb_mgmd
- 4 x ndbmtd data nodes
- 4 x mysqld SQL/API nodes

Optional configuration used in the captured run:
[ndbd default]
TimeBetweenGlobalCheckpoints=20

Steps:

1. Start the cluster and wait until all data nodes are started:
   ndb_mgm -e SHOW

2. Connect through one SQL/API node and create two NDB tables successfully, for example:

   CREATE DATABASE IF NOT EXISTS depstatepp_bughunt;
   USE depstatepp_bughunt;

   CREATE TABLE IF NOT EXISTS trial_case_000013_0 (
     case_id VARCHAR(64) NOT NULL,
     table_idx INT NOT NULL,
     k INT NOT NULL,
     node_hint INT NOT NULL,
     payload VARCHAR(192) NOT NULL,
     v BIGINT NOT NULL,
     PRIMARY KEY(case_id, table_idx, k),
     KEY idx_node_hint(node_hint),
     KEY idx_payload(payload)
   ) ENGINE=NDBCLUSTER;

   CREATE TABLE IF NOT EXISTS trial_case_000013_1 (
     case_id VARCHAR(64) NOT NULL,
     table_idx INT NOT NULL,
     k INT NOT NULL,
     node_hint INT NOT NULL,
     payload VARCHAR(192) NOT NULL,
     v BIGINT NOT NULL,
     PRIMARY KEY(case_id, table_idx, k),
     KEY idx_node_hint(node_hint),
     KEY idx_payload(payload)
   ) ENGINE=NDBCLUSTER;

   Insert some rows into both tables.

3. From the management node, trigger restart activity on data nodes:
   ndb_mgm -e "2 RESTART"
   immediately followed by
   ndb_mgm -e "3 RESTART"

   In the captured run, the second command returned:
   5063 - Operation not allowed while nodes are starting or stopping

4. Do not wait for a long full stabilization period.
   Instead, in the early recovery window immediately after step 3, run:

   CREATE TABLE IF NOT EXISTS trial_case_000013_2 (
     case_id VARCHAR(64) NOT NULL,
     table_idx INT NOT NULL,
     k INT NOT NULL,
     node_hint INT NOT NULL,
     payload VARCHAR(192) NOT NULL,
     v BIGINT NOT NULL,
     PRIMARY KEY(case_id, table_idx, k),
     KEY idx_node_hint(node_hint),
     KEY idx_payload(payload)
   ) ENGINE=NDBCLUSTER;

5. At the same time, collect:
   - ndb_mgm -e SHOW
   - mysqld error log
   - any SHOW WARNINGS output immediately after the CREATE TABLE failure

Observed in the captured run:
- the CREATE TABLE attempt happened about 1-2 seconds before the node-failure / re-subscribe messages became visible in the SQL/API logs
- CREATE TABLE returned:
  ERROR 1005 (HY000): Can't create table 'trial_case_000013_2' (use SHOW WARNINGS for more info)

Relevant captured timestamps:
- A side CREATE TABLE log:
  2026-06-06T11:27:12.991668Z [NDB] Creating table 'depstatepp_bughunt.trial_case_000013_2'
- A side data node failure visible shortly after:
  2026-06-06T11:27:32.941369Z Data node 2 failed

- B side CREATE TABLE log:
  2026-06-06T11:27:14.398943Z [NDB] Creating table 'depstatepp_bughunt.trial_case_000013_2'
- B side data node failure visible shortly after:
  2026-06-06T11:28:08.145189Z Data node 2 failed

The symptom to check is whether NDB DDL in this early recovery window fails with generic ERROR 1005.

Suggested fix:
Please check whether CREATE TABLE ... ENGINE=NDBCLUSTER is allowed to enter a partially unstable recovery window after data-node restart.

If the cluster is not yet ready for NDB DDL, it would be better either:
1) to block the DDL until the cluster is fully ready, or
2) to return a more specific error that clearly indicates recovery / restart state rather than generic ERROR 1005.

version error

Apply for S2