MySQL Bugs: #102502: Deadlock timeout due to problem in REDO log queue handling

Bug #102502	Deadlock timeout due to problem in REDO log queue handling
Submitted:	6 Feb 2021 18:15	Modified:	11 Oct 2021 13:22
Reporter:	Mikael Ronström	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	8.0.23	OS:	Ubuntu
Assigned to:		CPU Architecture:	x86

Description:
When loading data into DBT2, the load fails due to getting a
deadlock timeout.
This is due to that the method writePrepareLog is used to write
log entries from the REDO log queue. This method requires a bit
of checks to actually support that.

How to repeat:
Run DBT2 and load 128 warehouses in a 2-node 2-replica setup with
4 LDMs in each data node. Use 8 DBT2 loaders.
This will fail after about 1-2 minutes of loading.

Suggested fix:
Ensure that the method writePrepareLog can handle writes from
REDO log queue.

The workaround is most likely to increase size of the REDO log buffer or
decrease the speed of loading.

Hi Mikael,

thanks for the report!

all best
Bogdan

Documented fix as follows in the NDB 8.0.29 changelog:

    When a redo log part is unable to accept an operation's log
    entry immediately, the operation (a prepare, commit, or abort)
    is queued, or (prepare only) optionally aborted. By default
    operations are queued.

    This mechanism was modified in 8.0.23 as part of decoupling
    local data managers and redo log parts, and introduced a
    regression whereby it was possible for queued operations to
    remain in the queued state until all activity on the log part
    quiesced. When this occurred, operations could remain queued
    until DBTC declared them timed out, and aborted them.

Closed.