Bug #105724 LQHKEYREQ races COMMIT signal in COPY_FRAGREQ situations
Submitted: 26 Nov 2021 19:15 Modified: 28 Nov 2021 6:00
Reporter: Mikael Ronström Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.27 OS:Any
Assigned to: CPU Architecture:Any

[26 Nov 2021 19:15] Mikael Ronström
Description:
This bug is very rare, but seems fairly straightforward to understand.

The bug requires:
1) A transaction that deletes PK=x in rowid=x
2) A transaction that inserts PK=x in rowid=y
3) x != y

The problem is that the COMMIT message of the DELETE operation that releases
the lock in the primary arrives in the starting node after the LQHKEYREQ that wants
to insert. This leads to that the row is still remaining in the node when the INSERT
arrives in the starting node. This insert finds no row in rowid=x but finds a row in
rowid=y and this row is even locked!

This leads to a crash since the code is not designed for this case.

How to repeat:
testIndex -r 10 -n NF_Mixed T1 T6 T13

Suggested fix:
Disable PACKED signal sending of COMMIT and COMPLETE signals during
COPY_FRAGREQ execution.
Could be considered to never use PACKED signals for COMMIT and COMPLETE
signals between LQHs since it could cause error 899 as well.
[26 Nov 2021 19:29] Mikael Ronström
It should be enough to ensure that COMMIT/COMPLETE after unlock of INSERT/DELETE
or multi-row changes avoid the PACKED signal.
[28 Nov 2021 6:00] MySQL Verification Team
Thanks for the report Mikael,

kind regards
Bogdan