Bug #116218 unconsistency bug in mysql NDB cluster
Submitted: 25 Sep 2024 8:18 Modified: 25 Sep 2024 13:32
Reporter: CunDi Fang Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.35-cluster MySQL Cluster Community S OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[25 Sep 2024 8:18] CunDi Fang
Description:
The following types of warning messages appear several times in the log:
```
2024-09-23 13:31:36 [MgmtSrvr] WARNING  -- Node 4: index 22 stats version 0: clean new: error 280 line 2554
2024-09-23 13:31:36 [MgmtSrvr] WARNING  -- Node 2: DICT: index 22 stats auto-update error: 280
```
These warnings indicate that node 5 encountered an error when attempting to clean up or update index statistics during the automatic index statistics update process, with an error code of 280 and pointing to a specific line number (line 2554).

How to repeat:
The specific error log is as follows:
```
2024-09-23 13:33:59 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update starting
2024-09-23 13:33:59 [MgmtSrvr] INFO     -- Node 5: Communication to Node 6 opened
2024-09-23 13:33:59 [MgmtSrvr] INFO     -- Node 5: Communication to Node 7 opened
2024-09-23 13:33:59 [MgmtSrvr] INFO     -- Node 5: Communication to Node 8 opened
2024-09-23 13:33:59 [MgmtSrvr] INFO     -- Node 5: Communication to Node 9 opened
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 6 Connected
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 7 Connected
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 8 Connected
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 6: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 8: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 7: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 9 Connected
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Node 9: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Suma: initiate handover for startup with nodes 0000000000000000000000000000000000000008 GCI: 5731483
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 5: Suma: handover from node 3 gci: 5731483 buckets: 00000002 (2)
2024-09-23 13:34:00 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update starting
2024-09-23 13:34:03 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update starting - Repeated 2 times
2024-09-23 13:34:05 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update starting
2024-09-23 13:34:05 [MgmtSrvr] INFO     -- Node 5: Start phase 101 completed (node restart)
2024-09-23 13:34:05 [MgmtSrvr] INFO     -- Node 2: NR Status: node=5,OLD=Wait handover of subscriptions,NEW=Restart completed
2024-09-23 13:34:05 [MgmtSrvr] INFO     -- Node 5: Started (mysql-8.0.35 ndb-8.0.35)
2024-09-23 13:34:05 [MgmtSrvr] INFO     -- Node 2: DICT: unlocked by node 5 for NodeRestart
2024-09-23 13:34:05 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update starting
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 5: index 118 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: index 119 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: index-build table 117 index: 120 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 3: index-build table 117 index: 120 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 5: index-build table 117 index: 120 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 4: index-build table 117 index: 120 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: index 121 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 4: index-build table 117 index: 122 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 5: index-build table 117 index: 122 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: index-build table 117 index: 122 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 3: index-build table 117 index: 122 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: index 123 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: index-build table 117 index: 124 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 4: index-build table 117 index: 124 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 3: index-build table 117 index: 124 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 5: index-build table 117 index: 124 processed 0 rows
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update starting
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 5: index 37 stats version 4: scan frag: created 1 samples
2024-09-23 13:34:06 [MgmtSrvr] INFO     -- Node 2: DICT: index 37 stats auto-update done
2024-09-23 13:34:07 [MgmtSrvr] INFO     -- Node 2: DICT: index 39 stats auto-update starting
2024-09-23 13:34:07 [MgmtSrvr] INFO     -- Node 5: index 39 stats version 3: scan frag: created 0 samples
2024-09-23 13:34:07 [MgmtSrvr] INFO     -- Node 2: DICT: index 39 stats auto-update done
2024-09-23 13:34:07 [MgmtSrvr] INFO     -- Node 5: Node shutdown initiated
2024-09-23 13:34:08 [MgmtSrvr] INFO     -- Node 2: DICT: index 42 stats auto-update starting
2024-09-23 13:34:08 [MgmtSrvr] INFO     -- Node 5: index 42 stats version 4: scan frag: created 1 samples
2024-09-23 13:34:08 [MgmtSrvr] INFO     -- Node 2: DICT: index 42 stats auto-update done
2024-09-23 13:34:09 [MgmtSrvr] INFO     -- Node 2: DICT: index 43 stats auto-update starting
2024-09-23 13:34:09 [MgmtSrvr] INFO     -- Node 2: index 43 stats version 4: scan frag: created 1 samples
2024-09-23 13:34:09 [MgmtSrvr] INFO     -- Node 2: DICT: index 43 stats auto-update done
2024-09-23 13:34:10 [MgmtSrvr] INFO     -- Node 2: DICT: index 45 stats auto-update starting
2024-09-23 13:34:10 [MgmtSrvr] INFO     -- Node 2: index 45 stats version 3: scan frag: created 0 samples
2024-09-23 13:34:10 [MgmtSrvr] INFO     -- Node 2: DICT: index 45 stats auto-update done
2024-09-23 13:34:11 [MgmtSrvr] INFO     -- Node 2: DICT: index 47 stats auto-update starting
2024-09-23 13:34:11 [MgmtSrvr] INFO     -- Node 2: index 47 stats version 4: scan frag: created 1 samples
2024-09-23 13:34:11 [MgmtSrvr] INFO     -- Node 2: DICT: index 47 stats auto-update done
2024-09-23 13:34:12 [MgmtSrvr] INFO     -- Node 2: DICT: index 50 stats auto-update starting
2024-09-23 13:34:12 [MgmtSrvr] WARNING  -- Node 5: index 50 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:12 [MgmtSrvr] WARNING  -- Node 2: DICT: index 50 stats auto-update error: 280
2024-09-23 13:34:13 [MgmtSrvr] INFO     -- Node 2: DICT: index 51 stats auto-update starting
2024-09-23 13:34:13 [MgmtSrvr] WARNING  -- Node 5: index 51 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:13 [MgmtSrvr] WARNING  -- Node 2: DICT: index 51 stats auto-update error: 280
2024-09-23 13:34:14 [MgmtSrvr] INFO     -- Node 2: DICT: index 53 stats auto-update starting
2024-09-23 13:34:14 [MgmtSrvr] WARNING  -- Node 5: index 53 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:14 [MgmtSrvr] WARNING  -- Node 2: DICT: index 53 stats auto-update error: 280
2024-09-23 13:34:15 [MgmtSrvr] INFO     -- Node 2: DICT: index 58 stats auto-update starting
2024-09-23 13:34:15 [MgmtSrvr] WARNING  -- Node 5: index 58 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:15 [MgmtSrvr] WARNING  -- Node 2: DICT: index 58 stats auto-update error: 280
2024-09-23 13:34:16 [MgmtSrvr] INFO     -- Node 5: Suma: initiate handover for shutdown with nodes 0000000000000000000000000000000000000008 GCI: 5731491
2024-09-23 13:34:16 [MgmtSrvr] INFO     -- Node 5: Suma: handover to node 3 gci: 5731491 buckets: 00000002 (2)
2024-09-23 13:34:17 [MgmtSrvr] INFO     -- Node 2: DICT: index 60 stats auto-update starting
2024-09-23 13:34:17 [MgmtSrvr] WARNING  -- Node 5: index 60 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:17 [MgmtSrvr] WARNING  -- Node 2: DICT: index 60 stats auto-update error: 280
2024-09-23 13:34:18 [MgmtSrvr] INFO     -- Node 2: DICT: index 67 stats auto-update starting
2024-09-23 13:34:18 [MgmtSrvr] WARNING  -- Node 5: index 67 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:18 [MgmtSrvr] WARNING  -- Node 2: DICT: index 67 stats auto-update error: 280
2024-09-23 13:34:19 [MgmtSrvr] INFO     -- Node 2: DICT: index 69 stats auto-update starting
2024-09-23 13:34:19 [MgmtSrvr] WARNING  -- Node 5: index 69 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:19 [MgmtSrvr] WARNING  -- Node 2: DICT: index 69 stats auto-update error: 280
2024-09-23 13:34:20 [MgmtSrvr] INFO     -- Node 2: DICT: index 71 stats auto-update starting
2024-09-23 13:34:20 [MgmtSrvr] WARNING  -- Node 5: index 71 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:20 [MgmtSrvr] WARNING  -- Node 2: DICT: index 71 stats auto-update error: 280
2024-09-23 13:34:21 [MgmtSrvr] INFO     -- Node 2: DICT: index 73 stats auto-update starting
2024-09-23 13:34:21 [MgmtSrvr] WARNING  -- Node 5: index 73 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:21 [MgmtSrvr] WARNING  -- Node 2: DICT: index 73 stats auto-update error: 280
2024-09-23 13:34:21 [MgmtSrvr] INFO     -- Node 5: Suma: handover complete
2024-09-23 13:34:22 [MgmtSrvr] INFO     -- Node 2: DICT: index 75 stats auto-update starting
2024-09-23 13:34:22 [MgmtSrvr] WARNING  -- Node 5: index 75 stats version 0: clean new: error 280 line 2554
2024-09-23 13:34:22 [MgmtSrvr] WARNING  -- Node 2: DICT: index 75 stats auto-update error: 280
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 2: DICT: index 85 stats auto-update starting
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 5 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:34:23 [MgmtSrvr] ALERT    -- Node 4: Node 5 Disconnected
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 2: Communication to Node 5 closed
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 4: Communication to Node 5 closed
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 3: Communication to Node 5 closed
2024-09-23 13:34:23 [MgmtSrvr] ALERT    -- Node 1: Node 5 Disconnected
2024-09-23 13:34:23 [MgmtSrvr] ALERT    -- Node 2: Arbitration check won - node group majority
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 2: President restarts arbitration thread [state=6]
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 2: NR Status: node=5,OLD=Restart completed,NEW=Node failed, fail handling ongoing
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 4: NR Status: node=5,OLD=Restart completed,NEW=Node failed, fail handling ongoing
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 3: NR Status: node=5,OLD=Restart completed,NEW=Node failed, fail handling ongoing
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 2: index 85 stats version 3: scan frag: created 0 samples
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 2: DICT: index 85 stats auto-update done
2024-09-23 13:34:23 [MgmtSrvr] ALERT    -- Node 3: Node 5 Disconnected
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Node 5: Node shutdown completed.
2024-09-23 13:34:23 [MgmtSrvr] ALERT    -- Node 2: Node 5 Disconnected
2024-09-23 13:34:23 [MgmtSrvr] INFO     -- Alloc node id 5 rejected with error code 1703, will retry
2024-09-23 13:34:24 [MgmtSrvr] INFO     -- Node 4: NR Status: node=5,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2024-09-23 13:34:24 [MgmtSrvr] INFO     -- Node 3: NR Status: node=5,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2024-09-23 13:34:24 [MgmtSrvr] INFO     -- Node 2: NR Status: node=5,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
```

Suggested fix:
This will eventually lead to node 5 not being able to maintain a stable connection with the other nodes in the cluster, resulting in overall cluster instability. And it may interfere with the normal update process of index statistics. There should be a problem in processing index statistics.
[25 Sep 2024 8:48] MySQL Verification Team
Hi,

What is the CPU load on this box? Anything in system log?

Can you explain how to reproduce this?

Thanks
[25 Sep 2024 13:17] CunDi Fang
OK, here is mine CPU:
```
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  256
  On-line CPU(s) list:   0-255
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7763 64-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  64
    Socket(s):           2
    Stepping:            1
    Frequency boost:     enabled
    CPU max MHz:         3529.0520
    CPU min MHz:         1500.0000
```
About how to reproduce, I just remember that I was restarting the cluster and backup data. However, here is the SQL log of two SQL servers, I hope that can help you:

log1: mysql ndb cluster-116218-log1.txt
log2: mysql ndb cluster-116218-log2.txt

also, I read the previous log files on the management node and found that before the inconsistency occurred, node 5 was in the process of restarting and data synchronisation, then during the data synchronisation process, node 2 started an index update, and then the problem occurred, as shown in the following log file:

log:mysql ndb cluster-116218-log3.txt
[25 Sep 2024 13:35] MySQL Verification Team
This looks like a duplicate of Bug#116216