Description:
Hello, I found a hang bug in 8.0.35-cluster version of MYSQL cluster.
The detail is as follow.
OS version and name:
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux
When I execute my select statement on one node and an insert statement on another node at the same time, it gets the node executing the select statement stuck and relies on mysql's own timeout mechanism to not be able to handle and jump out of this stuck state.
I am executing this select statement at one of the nodes:
Poc 1:
```
select
ref_4.column3 as c0,
(select TABLE_ID from information_schema.INNODB_TEMP_TABLE_INFO limit 1 offset 2)
as c1,
case when EXISTS (
select
ref_8.column6 as c0,
ref_8.column2 as c1,
subq_0.c11 as c2
from
mytest90.test3 as ref_8
where ref_8.column6 is NULL
limit 59) then ref_6.column7 else ref_6.column7 end
as c2,
ref_6.column4 as c3,
ref_6.column7 as c4
from
mytest90.test6 as ref_0
left join (select
ref_1.column3 as c0,
ref_1.column3 as c1,
ref_1.column1 as c2,
ref_1.column3 as c3,
ref_1.column1 as c4,
ref_1.column1 as c5,
ref_1.column1 as c6,
ref_1.column1 as c7,
ref_1.column3 as c8,
ref_1.column1 as c9,
(select column66 from mytest90.test2 limit 1 offset 5)
as c10,
(select tab_status from ndbinfo.table_distribution_status limit 1 offset 5)
as c11,
ref_1.column1 as c12,
ref_1.column3 as c13,
ref_1.column1 as c14
from
mytest90.test1 as ref_1
where (ref_1.column3 is NULL)
or (ref_1.column3 is not NULL)
limit 97) as subq_0
inner join mytest90.test3 as ref_2
left join mytest90.test3 as ref_3
inner join mytest90.test0 as ref_4
right join mytest90.test3 as ref_5
on (false)
on (true)
inner join mytest90.test7 as ref_6
on ((false)
or (ref_5.column3 is NULL))
right join mytest90.test3 as ref_7
on (ref_5.column7 = ref_7.column7 )
on (true)
on (ref_3.column65 is not NULL)
on (ref_0.column1 = ref_2.column65 )
where (select NAME from performance_schema.setup_consumers limit 1 offset 4)
is not NULL
limit 88;
```
And at the same time this INSERT statement is performed at another node:
Poc 2:
```
insert into mytest90.test1 values (
82,
case when EXISTS (
select
ref_0.column60 as c0,
ref_0.column9 as c1,
ref_0.column6 as c2,
ref_0.column5 as c3,
(select SUM_ERROR_RAISED from performance_schema.events_errors_summary_by_host_by_error limit 1 offset 4)
as c4,
ref_0.column10 as c5,
ref_0.column9 as c6,
ref_0.column7 as c7
from
mytest90.test9 as ref_0
where ((select error_code from ndbinfo.error_messages limit 1 offset 6)
is not NULL) and (ref_0.column10 is NULL)
limit 132) then 92 else 92 end
);
```
I find that the insert statement executes successfully, but the select statement gets stuck.
And when I replace the INSERT statement with the following INSERT statement, the same condition happens and the node that executes the SELECT statement also gets stuck and fails to jump out.
Poc 3:
```
insert into mytest90.test1 values (
case when 57 is NULL then 88 else 88 end
,
case when ((((((false)
or (((8 is not NULL)
and (true))
and ((50 is not NULL)
or (false))))
and (true))
or (((15 is NULL)
and (true))
and (97 is NULL)))
or (false))
and (73 is not NULL))
and ((EXISTS (
select
ref_0.column1 as c0,
ref_0.column1 as c1
from
mytest90.test6 as ref_0
where ref_0.column1 is not NULL
limit 51))
and (((71 is not NULL)
or (true))
or (true))) then 100 else 100 end
);
```
Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M
[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster
[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''
Attempted and successfully reproduced!
How to repeat:
Simply execute that PoC to repeat it.
Suggested fix:
seems like a dead loop