Description:
I found a reproducible NDB data divergence candidate in MySQL Cluster Community Server 9.3.0-cluster.
I used two otherwise identical NDB clusters side by side (A and B) to detect behavioral differences under the same data seed and the same SQL workload. The only intentional difference was:
- Cluster A: default / baseline setting
- Cluster B: [ndbd default] TimeBetweenGlobalCheckpoints changed from 2000 to 20
Topology on each side:
- 1 management node
- 4 data nodes (ndbmtd)
- 4 SQL/API nodes (mysqld)
Both sides were healthy before the failing phase:
- all data nodes were started
- all SQL/API nodes were connected
- local SQL health checks succeeded
- metadata / table definitions matched on both sides
Test shape:
- database: depstatepp_bughunt
- 4 NDB tables involved:
- trial_case_000000_depstate_canary
- trial_case_000000_0
- trial_case_000000_1
- trial_case_000000_2
- all tables use ENGINE=NDBCLUSTER
- the 3 main trial tables have the same schema pattern:
(case_id, table_idx, k, node_hint, payload, v)
with PRIMARY KEY(case_id, table_idx, k)
and KEY idx_node_hint(node_hint)
What I did:
1. Seeded both clusters with the same initial data.
2. Ran the same pre-action SQL on both sides using 4 concurrent SQL clients.
3. Executed a management metadata command on both sides:
ndb_mgm -e SHOW
4. Immediately after that, ran the same post-action SQL on both sides, again with 4 concurrent SQL clients.
The post-action SQL pattern was:
- one canary INSERT/UPSERT per client
- then, for each client shard, 3 UPDATE statements:
UPDATE trial_case_000000_X
SET v = v + 7
WHERE case_id='case_000000' AND node_hint=<shard>
- each UPDATE was followed by a COUNT/SUM query
What I expected:
Since the two sides had the same seed data, the same table definitions, the same client-sharded SQL, and all updates were logically shard-local by node_hint, I expected both sides either:
1) to complete successfully and end in the same final data state, or
2) to fail in the same way.
What actually happened:
- On cluster A, SQL client 3 (running on depstate-a-ndb4) failed in the post-action phase with:
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
- The failure happened at line 4 of that client script, i.e. on the first post-action UPDATE:
UPDATE trial_case_000000_0
SET v = v + 7
WHERE case_id='case_000000' AND node_hint=3
- On cluster B, the corresponding client completed successfully.
- After the run, both sides were still healthy and metadata remained identical.
- Row counts were also identical:
101 rows on A and 101 rows on B
- Table counts were identical:
5 tables on A and 5 tables on B
- But the final data-state hashes diverged:
A: 09f576c34eeacff9340d0406c435c354e245ff1e1b952d8b2a4605d37157835d
B: 22e1afacae64c65fd32b00a146133be871eaec4b9556d770d66186abed2ce100
The total value sum also diverged:
- A total v sum: 6038602
- B total v sum: 6038770
The difference is exactly 168, which matches 24 rows not receiving the expected +7 update
(8 rows per table × 3 tables × 7 = 168).
So the visible symptom is:
one side gets a lock wait timeout in a shard-local post-action UPDATE, the other side succeeds, and the final NDB data state diverges even though metadata and row counts remain identical.
How to repeat:
This issue is easiest to observe with two side-by-side NDB clusters, because the bug is a final state divergence under the same workload.
Environment:
- MySQL Cluster Community Server 9.3.0-cluster
- Linux / Docker
- Per cluster:
- 1 x ndb_mgmd
- 4 x ndbmtd
- 4 x mysqld (API / SQL)
- Same schema and same seed data on both sides
Config:
- Cluster A:
[ndbd default]
TimeBetweenGlobalCheckpoints=2000
- Cluster B:
[ndbd default]
TimeBetweenGlobalCheckpoints=20
Timeline / sequence:
T0. Start both clusters and wait until both are healthy.
On both sides, verify:
- ndb_mgm -e SHOW shows 4 data nodes and 4 mysqld(API) nodes connected
- local SQL health check succeeds (for example SELECT 1)
T1. On both sides, create the same schema:
CREATE DATABASE IF NOT EXISTS depstatepp_bughunt;
USE depstatepp_bughunt;
CREATE TABLE IF NOT EXISTS trial_case_000000_depstate_canary (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k)
) ENGINE=NDBCLUSTER;
CREATE TABLE IF NOT EXISTS trial_case_000000_0 (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k),
KEY idx_node_hint(node_hint)
) ENGINE=NDBCLUSTER;
CREATE TABLE IF NOT EXISTS trial_case_000000_1 (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k),
KEY idx_node_hint(node_hint)
) ENGINE=NDBCLUSTER;
CREATE TABLE IF NOT EXISTS trial_case_000000_2 (
case_id VARCHAR(64) NOT NULL,
table_idx INT NOT NULL,
k INT NOT NULL,
node_hint INT NOT NULL,
payload VARCHAR(192) NOT NULL,
v BIGINT NOT NULL,
PRIMARY KEY(case_id, table_idx, k),
KEY idx_node_hint(node_hint)
) ENGINE=NDBCLUSTER;
T2. Seed both sides with the same initial data.
Important properties of the seed:
- same case_id = 'case_000000'
- same rows on A and B
- 32 rows in each of the 3 main trial tables
- node_hint distributed across 0,1,2,3
- 5 canary rows in the canary table
T3. Run the same pre-action workload on both clusters using 4 concurrent SQL clients.
Each client is responsible for one shard/node_hint:
- client 0 -> node_hint=0
- client 1 -> node_hint=1
- client 2 -> node_hint=2
- client 3 -> node_hint=3
T4. Immediately after pre-action completes, run on both management nodes:
ndb_mgm -e SHOW
T5. Immediately after T4, run the following post-action workload concurrently on both clusters, again with 4 SQL clients.
For client 3 (the shard that failed in the captured run), the SQL is:
USE depstatepp_bughunt;
INSERT INTO trial_case_000000_depstate_canary
(case_id,table_idx,k,node_hint,payload,v)
VALUES('case_000000',-1,3,3,'depstate_canary:case_000000:post_action:sql_client_3:3',900303)
ON DUPLICATE KEY UPDATE
node_hint=VALUES(node_hint),
payload=VALUES(payload),
v=VALUES(v);
UPDATE trial_case_000000_0
SET v=v+7
WHERE case_id='case_000000' AND node_hint=3;
SELECT COUNT(*), COALESCE(SUM(v),0)
FROM trial_case_000000_0
WHERE node_hint=3;
UPDATE trial_case_000000_1
SET v=v+7
WHERE case_id='case_000000' AND node_hint=3;
SELECT COUNT(*), COALESCE(SUM(v),0)
FROM trial_case_000000_1
WHERE node_hint=3;
UPDATE trial_case_000000_2
SET v=v+7
WHERE case_id='case_000000' AND node_hint=3;
SELECT COUNT(*), COALESCE(SUM(v),0)
FROM trial_case_000000_2
WHERE node_hint=3;
At the same time, run the analogous node_hint=0/1/2 scripts in the other 3 clients.
T6. Compare the results on both sides.
In the captured failing run:
- A/client 3 failed with:
ERROR 1205 (HY000) at line 4: Lock wait timeout exceeded; try restarting transaction
- B/client 3 completed successfully
- all other visible health checks remained OK
T7. Compare final state, not only metadata.
A simple whole-state check is:
- total row count across the test tables
- total SUM(v)
- ordered hash over all rows in the 4 test tables
Observed failing result:
- same row count on both sides
- same table metadata on both sides
- different final data-state hash
- total SUM(v) differed by 168
This suggests a real final-state divergence, not just a transient client-side timeout.
Suggested fix:
Please investigate whether a management metadata command (ndb_mgm -e SHOW) executed between two phases of a concurrent NDB workload can interact badly with transaction / lock handling, especially when TimeBetweenGlobalCheckpoints is changed aggressively.
The unexpected part is not only the lock wait timeout on one side, but the fact that two otherwise identical runs end in different final NDB data states even though metadata and row counts remain identical.
At minimum, it would be helpful to understand:
1) why a shard-local UPDATE on node_hint=3 can hit ERROR 1205 here,
2) why the corresponding run on the other side succeeds,
3) whether the failed transaction path can lead to a partially divergent final state under this workload pattern.
Description: I found a reproducible NDB data divergence candidate in MySQL Cluster Community Server 9.3.0-cluster. I used two otherwise identical NDB clusters side by side (A and B) to detect behavioral differences under the same data seed and the same SQL workload. The only intentional difference was: - Cluster A: default / baseline setting - Cluster B: [ndbd default] TimeBetweenGlobalCheckpoints changed from 2000 to 20 Topology on each side: - 1 management node - 4 data nodes (ndbmtd) - 4 SQL/API nodes (mysqld) Both sides were healthy before the failing phase: - all data nodes were started - all SQL/API nodes were connected - local SQL health checks succeeded - metadata / table definitions matched on both sides Test shape: - database: depstatepp_bughunt - 4 NDB tables involved: - trial_case_000000_depstate_canary - trial_case_000000_0 - trial_case_000000_1 - trial_case_000000_2 - all tables use ENGINE=NDBCLUSTER - the 3 main trial tables have the same schema pattern: (case_id, table_idx, k, node_hint, payload, v) with PRIMARY KEY(case_id, table_idx, k) and KEY idx_node_hint(node_hint) What I did: 1. Seeded both clusters with the same initial data. 2. Ran the same pre-action SQL on both sides using 4 concurrent SQL clients. 3. Executed a management metadata command on both sides: ndb_mgm -e SHOW 4. Immediately after that, ran the same post-action SQL on both sides, again with 4 concurrent SQL clients. The post-action SQL pattern was: - one canary INSERT/UPSERT per client - then, for each client shard, 3 UPDATE statements: UPDATE trial_case_000000_X SET v = v + 7 WHERE case_id='case_000000' AND node_hint=<shard> - each UPDATE was followed by a COUNT/SUM query What I expected: Since the two sides had the same seed data, the same table definitions, the same client-sharded SQL, and all updates were logically shard-local by node_hint, I expected both sides either: 1) to complete successfully and end in the same final data state, or 2) to fail in the same way. What actually happened: - On cluster A, SQL client 3 (running on depstate-a-ndb4) failed in the post-action phase with: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction - The failure happened at line 4 of that client script, i.e. on the first post-action UPDATE: UPDATE trial_case_000000_0 SET v = v + 7 WHERE case_id='case_000000' AND node_hint=3 - On cluster B, the corresponding client completed successfully. - After the run, both sides were still healthy and metadata remained identical. - Row counts were also identical: 101 rows on A and 101 rows on B - Table counts were identical: 5 tables on A and 5 tables on B - But the final data-state hashes diverged: A: 09f576c34eeacff9340d0406c435c354e245ff1e1b952d8b2a4605d37157835d B: 22e1afacae64c65fd32b00a146133be871eaec4b9556d770d66186abed2ce100 The total value sum also diverged: - A total v sum: 6038602 - B total v sum: 6038770 The difference is exactly 168, which matches 24 rows not receiving the expected +7 update (8 rows per table × 3 tables × 7 = 168). So the visible symptom is: one side gets a lock wait timeout in a shard-local post-action UPDATE, the other side succeeds, and the final NDB data state diverges even though metadata and row counts remain identical. How to repeat: This issue is easiest to observe with two side-by-side NDB clusters, because the bug is a final state divergence under the same workload. Environment: - MySQL Cluster Community Server 9.3.0-cluster - Linux / Docker - Per cluster: - 1 x ndb_mgmd - 4 x ndbmtd - 4 x mysqld (API / SQL) - Same schema and same seed data on both sides Config: - Cluster A: [ndbd default] TimeBetweenGlobalCheckpoints=2000 - Cluster B: [ndbd default] TimeBetweenGlobalCheckpoints=20 Timeline / sequence: T0. Start both clusters and wait until both are healthy. On both sides, verify: - ndb_mgm -e SHOW shows 4 data nodes and 4 mysqld(API) nodes connected - local SQL health check succeeds (for example SELECT 1) T1. On both sides, create the same schema: CREATE DATABASE IF NOT EXISTS depstatepp_bughunt; USE depstatepp_bughunt; CREATE TABLE IF NOT EXISTS trial_case_000000_depstate_canary ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k) ) ENGINE=NDBCLUSTER; CREATE TABLE IF NOT EXISTS trial_case_000000_0 ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k), KEY idx_node_hint(node_hint) ) ENGINE=NDBCLUSTER; CREATE TABLE IF NOT EXISTS trial_case_000000_1 ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k), KEY idx_node_hint(node_hint) ) ENGINE=NDBCLUSTER; CREATE TABLE IF NOT EXISTS trial_case_000000_2 ( case_id VARCHAR(64) NOT NULL, table_idx INT NOT NULL, k INT NOT NULL, node_hint INT NOT NULL, payload VARCHAR(192) NOT NULL, v BIGINT NOT NULL, PRIMARY KEY(case_id, table_idx, k), KEY idx_node_hint(node_hint) ) ENGINE=NDBCLUSTER; T2. Seed both sides with the same initial data. Important properties of the seed: - same case_id = 'case_000000' - same rows on A and B - 32 rows in each of the 3 main trial tables - node_hint distributed across 0,1,2,3 - 5 canary rows in the canary table T3. Run the same pre-action workload on both clusters using 4 concurrent SQL clients. Each client is responsible for one shard/node_hint: - client 0 -> node_hint=0 - client 1 -> node_hint=1 - client 2 -> node_hint=2 - client 3 -> node_hint=3 T4. Immediately after pre-action completes, run on both management nodes: ndb_mgm -e SHOW T5. Immediately after T4, run the following post-action workload concurrently on both clusters, again with 4 SQL clients. For client 3 (the shard that failed in the captured run), the SQL is: USE depstatepp_bughunt; INSERT INTO trial_case_000000_depstate_canary (case_id,table_idx,k,node_hint,payload,v) VALUES('case_000000',-1,3,3,'depstate_canary:case_000000:post_action:sql_client_3:3',900303) ON DUPLICATE KEY UPDATE node_hint=VALUES(node_hint), payload=VALUES(payload), v=VALUES(v); UPDATE trial_case_000000_0 SET v=v+7 WHERE case_id='case_000000' AND node_hint=3; SELECT COUNT(*), COALESCE(SUM(v),0) FROM trial_case_000000_0 WHERE node_hint=3; UPDATE trial_case_000000_1 SET v=v+7 WHERE case_id='case_000000' AND node_hint=3; SELECT COUNT(*), COALESCE(SUM(v),0) FROM trial_case_000000_1 WHERE node_hint=3; UPDATE trial_case_000000_2 SET v=v+7 WHERE case_id='case_000000' AND node_hint=3; SELECT COUNT(*), COALESCE(SUM(v),0) FROM trial_case_000000_2 WHERE node_hint=3; At the same time, run the analogous node_hint=0/1/2 scripts in the other 3 clients. T6. Compare the results on both sides. In the captured failing run: - A/client 3 failed with: ERROR 1205 (HY000) at line 4: Lock wait timeout exceeded; try restarting transaction - B/client 3 completed successfully - all other visible health checks remained OK T7. Compare final state, not only metadata. A simple whole-state check is: - total row count across the test tables - total SUM(v) - ordered hash over all rows in the 4 test tables Observed failing result: - same row count on both sides - same table metadata on both sides - different final data-state hash - total SUM(v) differed by 168 This suggests a real final-state divergence, not just a transient client-side timeout. Suggested fix: Please investigate whether a management metadata command (ndb_mgm -e SHOW) executed between two phases of a concurrent NDB workload can interact badly with transaction / lock handling, especially when TimeBetweenGlobalCheckpoints is changed aggressively. The unexpected part is not only the lock wait timeout on one side, but the fact that two otherwise identical runs end in different final NDB data states even though metadata and row counts remain identical. At minimum, it would be helpful to understand: 1) why a shard-local UPDATE on node_hint=3 can hit ERROR 1205 here, 2) why the corresponding run on the other side succeeds, 3) whether the failed transaction path can lead to a partially divergent final state under this workload pattern.