Bug #114880 hang bug in mysql NDB cluster
Submitted: 4 May 2024 13:43 Modified: 7 May 2024 12:53
Reporter: CunDi Fang Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:8.0.35-cluster MySQL Cluster Community S OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[4 May 2024 13:43] CunDi Fang
Description:
Hello, I found a hang bug in 8.0.35-cluster version of MYSQL cluster. 
The detail is as follow.

Relying on NDB's own timeout mechanism doesn't handle this, he'll stay stuck.

OS version and name:
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux

I am executing the following sql statement on a node:

Poc:
```
elete from mytest100.test3
where 
EXISTS (
  select distinct
      25 as c0,
      ref_0.column3 as c1
    from
      mytest100.test2 as ref_0
        left join (select
                mytest100.test3.column7 as c0,
                ref_1.column4 as c1,
                mytest100.test3.column3 as c2,
                (select FIRST_SEEN from performance_schema.events_errors_summary_by_host_by_error limit 1 offset 6)
                   as c3,
                mytest100.test3.column6 as c4,
                ref_1.column2 as c5,
                mytest100.test3.column6 as c6,
                mytest100.test3.column3 as c7,
                79 as c8,
                ref_1.column4 as c9,
                mytest100.test3.column1 as c10,
                (select CURRENT_CONNECTIONS from performance_schema.accounts limit 1 offset 77)
                   as c11,
                ref_1.column4 as c12,
                mytest100.test3.column1 as c13,
                mytest100.test3.column2 as c14,
                (select COUNT_ALLOC from performance_schema.memory_summary_by_thread_by_event_name limit 1 offset 4)
                   as c15,
                ref_1.column1 as c16
              from
                mytest100.test1 as ref_1
              where true
              limit 111) as subq_0          left join mytest100.test9 as ref_2          on (EXISTS (
              select
                  mytest100.test3.column7 as c0,
                  ref_3.column1 as c1,
                  ref_3.column3 as c2,                  ref_3.column1 as c3,                  mytest100.test3.column7 as c4,
                  80 as c5,
                  subq_0.c2 as c6,
                  ref_2.column2 as c7,
                  mytest100.test3.column1 as c8,                  subq_0.c11 as c9,                  32 as c10,
                  (select avg_tmp_tables_per_query from sys.statements_with_temp_tables limit 1 offset 4)
                     as c11,
                  mytest100.test3.column6 as c12,
                  subq_0.c12 as c13,
                  subq_0.c4 as c14,
                  ref_3.column3 as c15,
                  ref_2.column2 as c16,
                  subq_0.c6 as c17,
                  ref_3.column1 as c18,
                  mytest100.test3.column2 as c19,
                  ref_3.column5 as c20,
                  mytest100.test3.column2 as c21,
                  subq_0.c9 as c22,
                  subq_0.c0 as c23
                from
                  mytest100.test2 as ref_3
                where subq_0.c7 is NULL))
        on (true)
    where subq_0.c2 is NULL
    limit 129);
```

Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M

[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster

[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1

[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''

Attempted and successfully reproduced!

How to repeat:
Simply execute that PoC to repeat it.

Suggested fix:
I can't even get its execution plan via explain, there should be a problem with the overall logic.
[7 May 2024 8:31] MySQL Verification Team
This does not look like a bug but unsupported type of query for ndbcluster.

During execution of this query
 - what is your cpu load on the nodes
 - what is your network load
 - what does your logs state

Thanks
[7 May 2024 12:53] CunDi Fang
Ok, I repeat again and here are the cpu load on the node.

Before I start the sql:
```
  15900 mysql     20   0 2762692 400112  36864 S   1.3   0.0   0:03.75 mysqld                                                                
  15975 root      20   0 1563572   1.3g   2048 S   1.0   0.1   0:02.88 ndbd                                                                  
      1 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.04 bash                                                                  
     13 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
     24 root      20   0  240796   4096   4096 S   0.0   0.0   0:08.28 fish                                                                  
    161 root      20   0   15428   6144   6144 S   0.0   0.0   0:00.02 sshd                                                                  
  12617 root      20   0   36272   8192   8192 T   0.0   0.0   0:00.04 mysql                                                                 
  15387 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
  15399 root      20   0    7584   2048   2048 R   0.0   0.0   0:00.25 top                                                                   
  15554 root      20   0    2888      0      0 S   0.0   0.0   0:00.02 mysqld_safe                                                           
  15836 root      20   0   36272   8192   8192 T   0.0   0.0   0:00.02 mysql                                                                 
  15974 root      20   0   25200   4228   2048 S   0.0   0.0   0:00.01 ndbd                                                                  
  16294 root      20   0   36272   8192   8192 S   0.0   0.0   0:00.02 mysql 
```

When I start the sql:
```
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                               
  15900 mysql     20   0 2762692 402160  36864 S 101.7   0.0   0:10.42 mysqld                                                                
  15975 root      20   0 1563572   1.3g   2048 S   1.0   0.1   0:03.43 ndbd                                                                  
      1 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.04 bash                                                                  
     13 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
     24 root      20   0  240796   4096   4096 S   0.0   0.0   0:08.28 fish                                                                  
    161 root      20   0   15428   6144   6144 S   0.0   0.0   0:00.02 sshd                                                                  
  12617 root      20   0   36272   8192   8192 T   0.0   0.0   0:00.04 mysql                                                                 
  15387 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
  15399 root      20   0    7584   2048   2048 R   0.0   0.0   0:00.29 top                                                                   
  15554 root      20   0    2888      0      0 S   0.0   0.0   0:00.02 mysqld_safe                                                           
  15836 root      20   0   36272   8192   8192 T   0.0   0.0   0:00.02 mysql                                                                 
  15974 root      20   0   25200   4228   2048 S   0.0   0.0   0:00.02 ndbd                                                                  
  16294 root      20   0   36272   8192   8192 S   0.0   0.0   0:00.02 mysql
```

It is stuck and never gets over it, even after 10 mins.
```
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                               
  17300 mysql     20   0 2893760 397824  36864 S 101.3   0.0  10:07.81 mysqld                                                                
  16998 root      20   0 1563572   1.3g   4096 S   1.0   0.1   0:09.07 ndbd                                                                  
  15399 root      20   0    7584   2048   2048 R   0.3   0.0   0:01.20 top                                                                   
      1 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.04 bash                                                                  
     13 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
     24 root      20   0  240856   4096   4096 S   0.0   0.0   0:08.52 fish                                                                      161 root      20   0   15428   6144   6144 S   0.0   0.0   0:00.02 sshd                                                                    15387 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
  16997 root      20   0   25200   4228   2048 S   0.0   0.0   0:00.16 ndbd                                                                  
  17086 root      20   0    2888      0      0 S   0.0   0.0   0:00.02 mysqld_safe                                                           
  17382 root      20   0   36272   8192   8192 S   0.0   0.0   0:00.02 mysql 
```

I have to ues "KILL -9" to stop mysqld.
```

^C^C -- query aborted
^C^C -- query aborted
^Zfish: Job 1, 'mysql -uroot -proot' has stopped

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                               
  17300 mysql     20   0 2894792 397824  36864 S 100.7   0.0  11:08.59 mysqld                                                                
  16998 root      20   0 1563572   1.3g   4096 S   1.0   0.1   0:09.75 ndbd                                                                  
      1 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.04 bash                                                                       13 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                       24 root      20   0  240856   4096   4096 S   0.0   0.0   0:08.52 fish                                                                  
    161 root      20   0   15428   6144   6144 S   0.0   0.0   0:00.02 sshd                                                                  
  15387 root      20   0    4624   2048   2048 S   0.0   0.0   0:00.02 bash                                                                  
  15399 root      20   0    7584   2048   2048 R   0.0   0.0   0:01.23 top                                                                   
  16997 root      20   0   25200   4228   2048 S   0.0   0.0   0:00.17 ndbd                                                                  
  17086 root      20   0    2888      0      0 S   0.0   0.0   0:00.02 mysqld_safe                                                           
  17382 root      20   0   36272   8192   8192 T   0.0   0.0   0:00.02 mysql 
```