Bug #105074 Take over error: DELETE immediately followed by INSERT
Submitted: 29 Sep 2021 11:18 Modified: 30 Sep 2021 11:47
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.23++ OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[29 Sep 2021 11:18] Mikael Ronström
The following happens.
1) A node is restarting
2) The NDB API first deletes a row followed immediately by an INSERT of the same row from a different transaction.

When 2) happens the DELETE is COMMITted in the starting node, but not yet
COMPLETEd. This means that since the starting node is a backup replica
the row is still locked.

The code in handle_nr_copy assumes that any INSERT will not see any locked
rows. There is no handling of a real-time break in this situation.

How to repeat:
Run the testcase testNodeRestart -n Bug16895311 T1
on a machine with many CPUs and using 3 replicas.

Suggested fix:
Ensure that starting nodes unlock the row already in the COMMIT phase.
This should ensure that any INSERTs that arrive from normal transactions
in the Copy phase arrive at the starting node cannot meet a locked row.
[29 Sep 2021 12:25] Mikael Ronström
After some deeper investigation it seems that the bug is caused by
the node order in normal transactions is wrong. So will investigate further.
Ignore this bug report for now.
[29 Sep 2021 13:02] MySQL Verification Team
Thanks Mikael, let us know if you discover something

all best
[30 Sep 2021 11:47] Mikael Ronström
Not a bug in NDB