Bug #102713 Queued REDO log record crashes with locked REDO log parts
Submitted: 24 Feb 2021 0:28 Modified: 25 Feb 2021 3:37
Reporter: Mikael Ronström Email Updates:
Status: Verified Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version: OS:Any
Assigned to: CPU Architecture:Any

[24 Feb 2021 0:28] Mikael Ronström
When executing a queued REDO log we call get_table_frag_record.This call is used to get the instance number and actually can failif the fragment record doesn't belong to the owner of the REDOlog. So the ndbrequire is not correct. Actually better to separatethis call into a new function get_table_frag_instance that avoidsgetting table and fragment pointers.

How to repeat:
Setup a configuration with more LDM threads than REDO log parts.
Execute heavy inserts to ensure that REDO log gets queued entries
(will help to use a small REDO log buffer).

Suggested fix:
Convert call to get_table_frag_record to new
method get_table_frag_instance that avoids
call to get table and fragment record and let
it be a void function and thus avoid the
ndbrequire that crashes.
[24 Feb 2021 17:41] MySQL Verification Team
Hi Mikael, thanks for the report, I managed to reproduce it :)

all best
[25 Feb 2021 3:35] Mikael Ronström
A special case with 3 LDMs and 4 log parts hits a problem where
handling the queued log record is done by the wrong LDM. The
variable deciding whether to use log mutex can be used instead
in get_table_frag_record*
[25 Feb 2021 3:37] Mikael Ronström
In addition the prepare queued log write sets logPartState to IDLE
before executing the log record. This is too early and interacts
badly with an ndbrequire that logPartState == ACTIVE if called from
queued log handling.