Bug #114862 Scan error in NDB cluster
Submitted: 3 May 2024 4:42 Modified: 7 May 2024 12:04
Reporter: CunDi Fang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:8.0.35-cluster MySQL Cluster Community S OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[3 May 2024 4:42] CunDi Fang
Description:
Hello, I found a bug in 8.0.35-cluster version of MYSQL cluster. It will cause "error 499". Maybe same with #BUG114849.

The detail is as follow. It need at least 4 nodes.

Node 1:
Poc:
```
update mytest90.test7 set column1 = mytest90.test7.column1;
```

result:
```
1713395528.670916 to 1713395530.24529 received:mysql_store_result() failed:Got temporary error 499 'Scan take over error, restart scan transaction' from NDBCLUSTER
```

Node 2:
Poc:
```
UPDATE mytest90.test7 SET column1 = 471, column2 = '2014-10-02 17:33:08', column5 = 304, column6 = 49.10, column7 = 74.23 WHERE ((column5 < NULL) AND column4 = 86.85) OR column4 > 7.85;
```

result:
```
 1713395528.670558 to 1713395530.16978 received:mysql_store_result() failed:Lock wait timeout exceeded; try restarting transaction
```

Node 3:
Poc:
```
 UPDATE mytest90.test7 SET column2 = '2005-05-12 06:44:03', column4 = 36.70, column5 = 33, column6 = 68.48, column7 = 24.09 WHERE (((column5 < NULL) OR column1 = 469) AND column1 < 194) OR column1 < 545;
```

result:
```
1713395528.678567 to 1713395530.44517 received:Query executed successfully, but no result set was returned.
```

Node 4:
Poc:
```
select
  ref_0.column1 as c0,
  (select total_latency from sys.x$user_summary_by_stages limit 1 offset 5)
     as c1
from
  mytest90.test5 as ref_0
where true
limit 88;
```

result:
```
1713395528.679083 to 1713395528.696545 received:Query executed successfully, but no result set was returned.
```

The conditions for this bug to occur are a bit more demanding, requiring a lock wait timeout to occur on node 2 before node 1 gets error 499.

Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M

[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster

[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1

[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''

How to repeat:
I've tried to reproduce it and have tight control over the timing of the injected sql, but I haven't been able to get the two update statements to have a lock contention and generate a timeout, so I haven't been able to reproduce it successfully. I'm hoping that the developer team will again be able to trigger this lock timeout tool and see if they can trigger this bug.

Suggested fix:
It should be caused by a vulnerability in the behavioral handling of the lock.
[7 May 2024 8:19] MySQL Verification Team
This is not a bug
[7 May 2024 8:21] MySQL Verification Team
Check Bug #86401
[7 May 2024 12:04] CunDi Fang
It shows that "You do not have access to bug #86401.".