Bug #114849 Scan error in NDB cluster
Submitted: 2 May 2024 9:53 Modified: 7 May 2024 11:57
Reporter: CunDi Fang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:8.0.35-cluster MySQL Cluster Community S OS:Any (20.04)
Assigned to: MySQL Verification Team CPU Architecture:Any

[2 May 2024 9:53] CunDi Fang
Description:
Hello, I found a bug in 8.0.35-cluster version of MYSQL cluster. It will cause "error 499".

The detail is as follow. It need at least 4 nodes.

Node 1:
Poc:
```
delete from mytest90.test9
where
(((mytest90.test9.column8 is not NULL)
      and ((EXISTS (
          select
              ref_1.column2 as c0
            from              mytest90.test1 as ref_0
                inner join mytest90.test1 as ref_1
                  right join mytest90.test4 as ref_2
                  on (true)
                on (true),
              lateral (select
                    mytest90.test9.column2 as c0,
                    mytest90.test9.column9 as c1,
                    ref_2.column4 as c2,
                    (select cpu_interrupt_time from ndbinfo.cpudata_1sec limit 1 offset 6)
                       as c3,
                    ref_1.column1 as c4,
                    mytest90.test9.column2 as c5,
                    ref_2.column4 as c6,
                    98 as c7,
                    mytest90.test9.column6 as c8,
                    ref_1.column2 as c9,
                    ref_3.column7 as c10,
                    ref_2.column4 as c11
                  from
                    mytest90.test7 as ref_3
                  where false) as subq_0
            where (EXISTS (
                select
                    (select PAGE_FAULTS_MINOR from information_schema.PROFILING limit 1 offset 2)
                       as c0,
                    51 as c1
                  from
                    mytest90.test2 as ref_4
                  where (false)
                    or ((false)
                      and (((false)
                          or (25 is NULL))
                        or (EXISTS (
                          select
                              ref_5.column4 as c0,
                              subq_0.c4 as c1,
                              (select SECONDARY_ENGINE_ATTRIBUTE from information_schema.TABLES_EXTENSIONS limit 1 offset 1)
                                 as c2
                            from
                              mytest90.test1 as ref_5
                            where (subq_0.c4 is not NULL)
                              or ((true)
                                or (false))))))
                  limit 134))
              or (true)
            limit 39))
        or ((mytest90.test9.column5 is not NULL)
          or ((mytest90.test9.column9 is not NULL)
            and (((false)
                and (false))
              or (false))))))
    or (true))
  or (((70 is not NULL)
      and (mytest90.test9.column4 is not NULL))
    or (true));
```

result:
```
1713386961.441228 to 1713386962.839522 received:Query executed successfully, but no result set was returned.
```

Node 2:
Poc:
```
update mytest90.test9 set column10 = mytest90.test9.column8, column2 = mytest90.test9.column5, column4 = mytest90.test9.column10, column7 = mytest90.test9.column7;
```

result:
```
1713386961.442722 to 1713386962.839131 received:mysql_store_result() failed:Got temporary error 499 'Scan take over error, restart scan transaction' from NDBCLUSTER
```

Node 3:
Poc:
```
update mytest90.test9 set column1 = true, column10 = mytest90.test9.column4, column2 = mytest90.test9.column3, column3 = mytest90.test9.column5, column8 = mytest90.test9.column4, column9 = case when 30 is NULL then mytest90.test9.column10 else mytest90.test9.column10 end ;
```

result:
```
1713386961.441399 to 1713386962.834928 received:mysql_store_result() failed:Lock wait timeout exceeded; try restarting transaction
```

Node 4:
Poc:
```
UPDATE mytest90.test9 SET column1 = NULL, column10 = 96.45, column2 = 624, column4 = 67.26, column5 = 608, column6 = NULL, column7 = '2001-12-01 07:31:27', column8 = 42.33, column9 = 49.22 WHERE (column6 < 165) OR column3 < 738;
```

result:
```
1713386961.441364 to 1713386962.776532 received:mysql_store_result() failed:Lock wait timeout exceeded; try restarting transaction
```

The conditions for this bug to occur are a bit more demanding, requiring a lock wait timeout to occur on both node 3 and node 4's update statements before node 2 gets error 499.

Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M

[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster

[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1

[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''

How to repeat:
I've tried to reproduce it and have tight control over the timing of the injected sql, but I haven't been able to get the two update statements to have a lock contention and generate a timeout, so I haven't been able to reproduce it successfully. I'm hoping that the developer team will again be able to trigger this lock timeout tool and see if they can trigger this bug.

Suggested fix:
It should be caused by a vulnerability in the behavioral handling of the lock.
[7 May 2024 8:22] MySQL Verification Team
Not a bug, check Bug #86401
[7 May 2024 11:57] CunDi Fang
It shows that "You do not have access to bug #86401.".