Bug #114759 hang bug in mysql NDB cluster
Submitted: 24 Apr 2024 8:52 Modified: 25 Apr 2024 1:43
Reporter: CunDi Fang Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:8.0.35-cluster MySQL Cluster Community S OS:Any (20.04)
Assigned to: MySQL Verification Team CPU Architecture:Any

[24 Apr 2024 8:52] CunDi Fang
Description:
Hello, I found a hang bug in 8.0.35-cluster version of MYSQL cluster. 
The detail is as follow.

OS version and name:
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux

When I execute my select statement on one node and an insert statement on another node at the same time, it gets the node executing the select statement stuck and relies on mysql's own timeout mechanism to not be able to handle and jump out of this stuck state.

I am executing this select statement at one of the nodes:
Poc 1:
```
select
  ref_4.column3 as c0,
  (select TABLE_ID from information_schema.INNODB_TEMP_TABLE_INFO limit 1 offset 2)
     as c1,
  case when EXISTS (
      select
          ref_8.column6 as c0,
          ref_8.column2 as c1,
          subq_0.c11 as c2
        from
          mytest90.test3 as ref_8
        where ref_8.column6 is NULL
        limit 59) then ref_6.column7 else ref_6.column7 end
     as c2,
  ref_6.column4 as c3,
  ref_6.column7 as c4
from
  mytest90.test6 as ref_0
    left join (select
            ref_1.column3 as c0,
            ref_1.column3 as c1,
            ref_1.column1 as c2,
            ref_1.column3 as c3,
            ref_1.column1 as c4,
            ref_1.column1 as c5,
            ref_1.column1 as c6,
            ref_1.column1 as c7,
            ref_1.column3 as c8,
            ref_1.column1 as c9,
            (select column66 from mytest90.test2 limit 1 offset 5)
               as c10,
            (select tab_status from ndbinfo.table_distribution_status limit 1 offset 5)
               as c11,
            ref_1.column1 as c12,
            ref_1.column3 as c13,
            ref_1.column1 as c14
          from
            mytest90.test1 as ref_1
          where (ref_1.column3 is NULL)
            or (ref_1.column3 is not NULL)
          limit 97) as subq_0
      inner join mytest90.test3 as ref_2
        left join mytest90.test3 as ref_3
              inner join mytest90.test0 as ref_4
                right join mytest90.test3 as ref_5
                on (false)
              on (true)
            inner join mytest90.test7 as ref_6
            on ((false)
                or (ref_5.column3 is NULL))
          right join mytest90.test3 as ref_7
          on (ref_5.column7 = ref_7.column7 )
        on (true)
      on (ref_3.column65 is not NULL)
    on (ref_0.column1 = ref_2.column65 )
where (select NAME from performance_schema.setup_consumers limit 1 offset 4)
     is not NULL
limit 88;
```

And at the same time this INSERT statement is performed at another node:
Poc 2:
```
insert into mytest90.test1 values (
82,
case when EXISTS (
    select
        ref_0.column60 as c0,
        ref_0.column9 as c1,
        ref_0.column6 as c2,
        ref_0.column5 as c3,
        (select SUM_ERROR_RAISED from performance_schema.events_errors_summary_by_host_by_error limit 1 offset 4)
           as c4,
        ref_0.column10 as c5,
        ref_0.column9 as c6,
        ref_0.column7 as c7
      from
        mytest90.test9 as ref_0
      where ((select error_code from ndbinfo.error_messages limit 1 offset 6)
             is not NULL)        and (ref_0.column10 is NULL)
      limit 132) then 92 else 92 end
  );
```
I find that the insert statement executes successfully, but the select statement gets stuck.

And when I replace the INSERT statement with the following INSERT statement, the same condition happens and the node that executes the SELECT statement also gets stuck and fails to jump out.

Poc 3:
```
insert into mytest90.test1 values (
case when 57 is NULL then 88 else 88 end
  ,
case when ((((((false)
              or (((8 is not NULL)
                  and (true))
                and ((50 is not NULL)
                  or (false))))
            and (true))
          or (((15 is NULL)
              and (true))
            and (97 is NULL)))
        or (false))
      and (73 is not NULL))
    and ((EXISTS (
        select
            ref_0.column1 as c0,
            ref_0.column1 as c1
          from
            mytest90.test6 as ref_0
          where ref_0.column1 is not NULL
          limit 51))
      and (((71 is not NULL)
          or (true))
        or (true))) then 100 else 100 end
  );
```

Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M

[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster

[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1

[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''

Attempted and successfully reproduced!

How to repeat:
Simply execute that PoC to repeat it.

Suggested fix:
seems like a dead loop
[24 Apr 2024 11:31] MySQL Verification Team
Duplicate of Bug #114464
[25 Apr 2024 1:43] CunDi Fang
Thanks for your reply, but I feel like there seems to be some difference with bug #114464? Because when I try to reproduce it, if I execute select alone, it doesn't get stuck in a similar way, but only when it's executed at the same time as insert? What is the reason for this?