MySQL Bugs: #114778: lost connection in MySQL NDB cluster

Bug #114778	lost connection in MySQL NDB cluster
Submitted:	25 Apr 2024 8:01	Modified:	7 May 2024 11:50
Reporter:	CunDi Fang	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	8.0.35-cluster MySQL Cluster Community S	OS:	Any (20.04)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
Hello, I found a bug in 8.0.35-cluster version of MYSQL cluster. It will cause the current mysql service to crash.

The detail is as follow.

OS version and name:
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux

This bug occurs when I execute two sql statements at the same time on two nodes. I am executing Poc1 in one node and Poc2 in the other node. as far as I have tested, the bug can be reproduced as long as both sql statements are executed at the same time. however, if Poc1 or Poc2 are executed individually, it is normal, there is no error condition and the mysql service does not crash.

Poc1:
```
delete from mytest90.test9
where 
EXISTS (
  select
      ref_0.column3 as c0,
      ref_0.column8 as c1
    from
      mytest90.test3 as ref_0
          right join (select
                (select MAX_TIMER_READ_ONLY from performance_schema.events_transactions_summary_by_host_by_event_name limit 1 offset 67)
                   as c0,
                ref_1.column4 as c1,
                mytest90.test9.column10 as c2,
                (select interval_start from mysql.gtid_executed limit 1 offset 92)
                   as c3,
                mytest90.test9.column9 as c4,
                mytest90.test9.column2 as c5,
                5 as c6,
                mytest90.test9.column9 as c7,
                ref_1.column2 as c8,
                mytest90.test9.column7 as c9,
                mytest90.test9.column3 as c10,
                ref_1.column5 as c11,
                mytest90.test9.column2 as c12,
                mytest90.test9.column9 as c13,
                mytest90.test9.column4 as c14,
                mytest90.test9.column10 as c15
              from
                mytest90.test9 as ref_1
              where mytest90.test9.column10 is not NULL) as subq_0
          on (5 is NULL)
        left join (select
              mytest90.test9.column7 as c0
            from
              mytest90.test4 as ref_2
            where 20 is not NULL) as subq_1
        on (subq_1.c0 is NULL)
    where mytest90.test9.column9 is NULL
    limit 124);
```

Poc 2:
```
INSERT INTO mytest90.test9 (column1, column10, column2, column3, column4, column5, column6, column7, column8, column9) VALUES (NULL, '2010-07-14 12:07:28', 742, 352, NULL, 367, NULL, '2002-02-17 18:05:14', 2.75, 57.80);
```

Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M

[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster

[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1

[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''

How to repeat:
Importing the database file I give later, and then executing Poc

Suggested fix:
This bug and bug #114777 seem to be similar, but I think they should be different. This bug should be caused by a data conflict during synchronization, which triggers some weird errors, but the synchronization process of the alter statement is different from that of the insert statement, including the delete statement, and the locks involved are different, so I think it should be similar but not the same bug.

cannot reproduce this and log files do not show much, can you 

1. shutdown cluster
2. delete all log files
3. start cluster
4. reproduce the issue
5. upload new ndb_error_reporter

thanks

p.s. would be good if you do not attach SQL scripts as DOCX but as .TXT

Hi,
Same as with 114778 there are no logs here.

Make sure to 

1. shutdown - make sure all nodes (including management) are stopped
2. delete logs
3. start

Hi,

Error logs are not here. In the logs I do see here I do not see any bug, you had some restarts, probably because your cluster is not sized properly, you have heartbeat issues, probably due to overload of cpu or network due to query that is not appropriate for ndbcluster. Anyhow can't reproduce and no logs available.

I apologize, but I did reproduce the problem on my machine and then ran the command "ndb_error_reporter /var/lib/mysql-cluster/config.ini root" on the control node to get the log file zip. I also did what you said and emptied the log file before reproducing it. I can't explain why the error is not reflected in the log file, but yes, it is possible to reproduce the bug if you execute sql statements on the corresponding sql nodes in a specific order at a specific point in time as I described, because that's what I did and I successfully reproduced it.

Translated with www.DeepL.com/Translator (free version)