| Bug #114880 | hang bug in mysql NDB cluster | ||
|---|---|---|---|
| Submitted: | 4 May 2024 13:43 | Modified: | 7 May 2024 12:53 |
| Reporter: | CunDi Fang | Email Updates: | |
| Status: | Unsupported | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 8.0.35-cluster MySQL Cluster Community S | OS: | Any |
| Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[7 May 2024 8:31]
MySQL Verification Team
This does not look like a bug but unsupported type of query for ndbcluster. During execution of this query - what is your cpu load on the nodes - what is your network load - what does your logs state Thanks
[7 May 2024 12:53]
CunDi Fang
Ok, I repeat again and here are the cpu load on the node.
Before I start the sql:
```
15900 mysql 20 0 2762692 400112 36864 S 1.3 0.0 0:03.75 mysqld
15975 root 20 0 1563572 1.3g 2048 S 1.0 0.1 0:02.88 ndbd
1 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.04 bash
13 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
24 root 20 0 240796 4096 4096 S 0.0 0.0 0:08.28 fish
161 root 20 0 15428 6144 6144 S 0.0 0.0 0:00.02 sshd
12617 root 20 0 36272 8192 8192 T 0.0 0.0 0:00.04 mysql
15387 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
15399 root 20 0 7584 2048 2048 R 0.0 0.0 0:00.25 top
15554 root 20 0 2888 0 0 S 0.0 0.0 0:00.02 mysqld_safe
15836 root 20 0 36272 8192 8192 T 0.0 0.0 0:00.02 mysql
15974 root 20 0 25200 4228 2048 S 0.0 0.0 0:00.01 ndbd
16294 root 20 0 36272 8192 8192 S 0.0 0.0 0:00.02 mysql
```
When I start the sql:
```
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15900 mysql 20 0 2762692 402160 36864 S 101.7 0.0 0:10.42 mysqld
15975 root 20 0 1563572 1.3g 2048 S 1.0 0.1 0:03.43 ndbd
1 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.04 bash
13 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
24 root 20 0 240796 4096 4096 S 0.0 0.0 0:08.28 fish
161 root 20 0 15428 6144 6144 S 0.0 0.0 0:00.02 sshd
12617 root 20 0 36272 8192 8192 T 0.0 0.0 0:00.04 mysql
15387 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
15399 root 20 0 7584 2048 2048 R 0.0 0.0 0:00.29 top
15554 root 20 0 2888 0 0 S 0.0 0.0 0:00.02 mysqld_safe
15836 root 20 0 36272 8192 8192 T 0.0 0.0 0:00.02 mysql
15974 root 20 0 25200 4228 2048 S 0.0 0.0 0:00.02 ndbd
16294 root 20 0 36272 8192 8192 S 0.0 0.0 0:00.02 mysql
```
It is stuck and never gets over it, even after 10 mins.
```
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17300 mysql 20 0 2893760 397824 36864 S 101.3 0.0 10:07.81 mysqld
16998 root 20 0 1563572 1.3g 4096 S 1.0 0.1 0:09.07 ndbd
15399 root 20 0 7584 2048 2048 R 0.3 0.0 0:01.20 top
1 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.04 bash
13 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
24 root 20 0 240856 4096 4096 S 0.0 0.0 0:08.52 fish 161 root 20 0 15428 6144 6144 S 0.0 0.0 0:00.02 sshd 15387 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
16997 root 20 0 25200 4228 2048 S 0.0 0.0 0:00.16 ndbd
17086 root 20 0 2888 0 0 S 0.0 0.0 0:00.02 mysqld_safe
17382 root 20 0 36272 8192 8192 S 0.0 0.0 0:00.02 mysql
```
I have to ues "KILL -9" to stop mysqld.
```
^C^C -- query aborted
^C^C -- query aborted
^Zfish: Job 1, 'mysql -uroot -proot' has stopped
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17300 mysql 20 0 2894792 397824 36864 S 100.7 0.0 11:08.59 mysqld
16998 root 20 0 1563572 1.3g 4096 S 1.0 0.1 0:09.75 ndbd
1 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.04 bash 13 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash 24 root 20 0 240856 4096 4096 S 0.0 0.0 0:08.52 fish
161 root 20 0 15428 6144 6144 S 0.0 0.0 0:00.02 sshd
15387 root 20 0 4624 2048 2048 S 0.0 0.0 0:00.02 bash
15399 root 20 0 7584 2048 2048 R 0.0 0.0 0:01.23 top
16997 root 20 0 25200 4228 2048 S 0.0 0.0 0:00.17 ndbd
17086 root 20 0 2888 0 0 S 0.0 0.0 0:00.02 mysqld_safe
17382 root 20 0 36272 8192 8192 T 0.0 0.0 0:00.02 mysql
```

Description: Hello, I found a hang bug in 8.0.35-cluster version of MYSQL cluster. The detail is as follow. Relying on NDB's own timeout mechanism doesn't handle this, he'll stay stuck. OS version and name: Ubuntu 22.04.3 LTS (Jammy Jellyfish) Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux I am executing the following sql statement on a node: Poc: ``` elete from mytest100.test3 where EXISTS ( select distinct 25 as c0, ref_0.column3 as c1 from mytest100.test2 as ref_0 left join (select mytest100.test3.column7 as c0, ref_1.column4 as c1, mytest100.test3.column3 as c2, (select FIRST_SEEN from performance_schema.events_errors_summary_by_host_by_error limit 1 offset 6) as c3, mytest100.test3.column6 as c4, ref_1.column2 as c5, mytest100.test3.column6 as c6, mytest100.test3.column3 as c7, 79 as c8, ref_1.column4 as c9, mytest100.test3.column1 as c10, (select CURRENT_CONNECTIONS from performance_schema.accounts limit 1 offset 77) as c11, ref_1.column4 as c12, mytest100.test3.column1 as c13, mytest100.test3.column2 as c14, (select COUNT_ALLOC from performance_schema.memory_summary_by_thread_by_event_name limit 1 offset 4) as c15, ref_1.column1 as c16 from mytest100.test1 as ref_1 where true limit 111) as subq_0 left join mytest100.test9 as ref_2 on (EXISTS ( select mytest100.test3.column7 as c0, ref_3.column1 as c1, ref_3.column3 as c2, ref_3.column1 as c3, mytest100.test3.column7 as c4, 80 as c5, subq_0.c2 as c6, ref_2.column2 as c7, mytest100.test3.column1 as c8, subq_0.c11 as c9, 32 as c10, (select avg_tmp_tables_per_query from sys.statements_with_temp_tables limit 1 offset 4) as c11, mytest100.test3.column6 as c12, subq_0.c12 as c13, subq_0.c4 as c14, ref_3.column3 as c15, ref_2.column2 as c16, subq_0.c6 as c17, ref_3.column1 as c18, mytest100.test3.column2 as c19, ref_3.column5 as c20, mytest100.test3.column2 as c21, subq_0.c9 as c22, subq_0.c0 as c23 from mytest100.test2 as ref_3 where subq_0.c7 is NULL)) on (true) where subq_0.c2 is NULL limit 129); ``` Architecture Information: ''' [NDBD DEFAULT] NoOfReplicas =2 DataMemory = 512M IndexMemory = 64M [NDB_MGMD] NodeId=1 hostname =192.172.10.8 datadir =/var/lib/mysql-cluster [NDBD] NodeId =2 hostname =192.172.10.9 datadir =/usr/local/mysql-cluster/data NodeGroup=0 [NDBD] NodeId =3 hostname =192.172.10.10 datadir =/usr/local/mysql-cluster/data NodeGroup=1 [NDBD] NodeId =4 hostname =192.172.10.11 datadir =/usr/local/mysql-cluster/data NodeGroup=0 [NDBD] NodeId =5 hostname =192.172.10.12 datadir =/usr/local/mysql-cluster/data NodeGroup=1 [mysqld] NodeId =6 hostname =192.172.10.9 [mysqld] NodeId =7 hostname =192.172.10.10 [mysqld] NodeId =8 hostname =192.172.10.11 [mysqld] NodeId =9 hostname =192.172.10.12 ''' Attempted and successfully reproduced! How to repeat: Simply execute that PoC to repeat it. Suggested fix: I can't even get its execution plan via explain, there should be a problem with the overall logic.