Bug #102502 Deadlock timeout due to problem in REDO log queue handling
Submitted: 6 Feb 2021 18:15 Modified: 11 Oct 2021 13:22
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.23 OS:Ubuntu
Assigned to: CPU Architecture:x86

[6 Feb 2021 18:15] Mikael Ronström
When loading data into DBT2, the load fails due to getting a
deadlock timeout.
This is due to that the method writePrepareLog is used to write
log entries from the REDO log queue. This method requires a bit
of checks to actually support that.

How to repeat:
Run DBT2 and load 128 warehouses in a 2-node 2-replica setup with
4 LDMs in each data node. Use 8 DBT2 loaders.
This will fail after about 1-2 minutes of loading.

Suggested fix:
Ensure that the method writePrepareLog can handle writes from
REDO log queue.

The workaround is most likely to increase size of the REDO log buffer or
decrease the speed of loading.
[8 Feb 2021 11:20] MySQL Verification Team
Hi Mikael,

thanks for the report!

all best
[11 Oct 2021 13:22] Jon Stephens
Documented fix as follows in the NDB 8.0.29 changelog:

    When a redo log part is unable to accept an operation's log
    entry immediately, the operation (a prepare, commit, or abort)
    is queued, or (prepare only) optionally aborted. By default
    operations are queued.

    This mechanism was modified in 8.0.23 as part of decoupling
    local data managers and redo log parts, and introduced a
    regression whereby it was possible for queued operations to
    remain in the queued state until all activity on the log part
    quiesced. When this occurred, operations could remain queued
    until DBTC declared them timed out, and aborted them.