MySQL Bugs: #102713: Queued REDO log record crashes with locked REDO log parts

Bug #102713	Queued REDO log record crashes with locked REDO log parts
Submitted:	24 Feb 2021 0:28	Modified:	25 Feb 2021 3:37
Reporter:	Mikael Ronström	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When executing a queued REDO log we call get_table_frag_record.This call is used to get the instance number and actually can failif the fragment record doesn't belong to the owner of the REDOlog. So the ndbrequire is not correct. Actually better to separatethis call into a new function get_table_frag_instance that avoidsgetting table and fragment pointers.

How to repeat:
Setup a configuration with more LDM threads than REDO log parts.
Execute heavy inserts to ensure that REDO log gets queued entries
(will help to use a small REDO log buffer).

Suggested fix:
Convert call to get_table_frag_record to new
method get_table_frag_instance that avoids
call to get table and fragment record and let
it be a void function and thus avoid the
ndbrequire that crashes.

Hi Mikael, thanks for the report, I managed to reproduce it :)

all best
Bogdan

A special case with 3 LDMs and 4 log parts hits a problem where
handling the queued log record is done by the wrong LDM. The
variable deciding whether to use log mutex can be used instead
in get_table_frag_record*

In addition the prepare queued log write sets logPartState to IDLE
before executing the log record. This is too early and interacts
badly with an ndbrequire that logPartState == ACTIVE if called from
queued log handling.