Bug #102647 Replica sql_thread getting stuck due to MASTER_DELAY
Submitted: 18 Feb 2021 16:33 Modified: 23 Jan 5:29
Reporter: Alok Pathak Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7, 5.7.33 OS:Any
Assigned to: CPU Architecture:Any

[18 Feb 2021 16:33] Alok Pathak
Description:
The replication applier thread in a multi-threaded replica can randomly get stuck in the "Waiting until MASTER_DELAY seconds after master executed event" state and doesn't apply any transaction for a long time when the replica is configured with MASTER_DELAY.

This issue is happening with the following configuration options:
1. Multi-threaded replication (slave_parallel_workers > 1).
2. GTID replication
3. MASTER_DELAY configured on the replica.

The following output confirms that Slave_SQL_Running_State got stuck in "Waiting until MASTER_DELAY seconds after master executed event" while Seconds_Behind_Master > SQL_Delay.

# mysql -e"show slave status\G" | grep -iE 'seconds|delay'
        Seconds_Behind_Master: 74795
                    SQL_Delay: 14400
          SQL_Remaining_Delay: 8752
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

The Relay_Master_Log_File and Exec_Master_Log_Pos don't change for hours.

while true; 
do echo "========================";
mysql -e"show slave status\G" | grep -iE 'relay_master_log_file|Exec_Master_Log_Pos'; 
sleep 5; 
done
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
.......

The workaround is to disable the parallel replication and sql_thread resumes normally.

Please review https://jira.percona.com/browse/PS-7498 for more details.

Confirmed the behaviour in the following version:
[root@db1 ~]# rpm -qa | grep -i mysql-community
mysql-community-common-5.7.33-1.el7.x86_64
mysql-community-client-5.7.33-1.el7.x86_64
mysql-community-libs-compat-5.7.33-1.el7.x86_64
mysql-community-libs-5.7.33-1.el7.x86_64
mysql-community-server-5.7.33-1.el7.x86_64

How to repeat:
1. Setup a Source/Replica pair of MySQL 5.7.33 version using the my.cnf noted below.

2. Configure GTID replication.
CHANGE MASTER TO
MASTER_HOST = 'Master_IP',
MASTER_PORT = 3306,
MASTER_USER = 'repl',
MASTER_PASSWORD = 'xxxxx',
MASTER_AUTO_POSITION = 1;

3. Verify that parallel replication is enabled and configure sql_delay.

mysql> show global variables where Variable_name in ('slave_preserve_commit_order','slave_parallel_type','slave_parallel_workers');
+-----------------------------+---------------+
| Variable_name | Value |
+-----------------------------+---------------+
| slave_parallel_type | LOGICAL_CLOCK |
| slave_parallel_workers | 4 |
| slave_preserve_commit_order | ON |
+-----------------------------+---------------+
3 rows in set (0.08 sec)

STOP SLAVE;
CHANGE MASTER TO MASTER_DELAY=1800;
START SLAVE;
 

4. Run sysbench workload on the master server.

 
sysbench oltp_read_write --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=<USER> --mysql-password="xxxxx" --mysql-db=test --table-size=200000 --tables=4 prepare

sysbench oltp_write_only --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=<USER> --mysql-password="xxxx" --mysql-db=test --report-interval=1 --rate=0 --threads=8 --time=120000 --table_size=200000 --tables=4 run
 

5. Monitor the replication status and wait until Seconds_Behind_Master is greater than SQL_Delay.

6. After replicating for 5-15 minutes (approx.) or processing a few relay logs files, the Exec_Master_Log_Pos would be stuck at the same position for the period of sql_delay seconds. 

For example:
mysql> show slave status\G
*************************** 1. row ***************************
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
          Exec_Master_Log_Pos: 602902238
...
        Seconds_Behind_Master: 3135
...
                    SQL_Delay: 1800
          SQL_Remaining_Delay: 465
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event
...

Sample output showing that sql_thread isn't moving while Seconds_Behind_Master > SQL_Delay. Replica thinks that it needs to wait for SQL_Remaining_Delay to become zero.

Please use the following configuration files.

=== Source my.cnf ===

[mysql]
port                           = 3306
socket                         = /var/lib/mysql/mysql.sock

[mysqld]
collation-server               = utf8_unicode_ci
character-set-server           = utf8
sql-mode                       = "STRICT_ALL_TABLES,STRICT_TRANS_TABLES"

user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/lib/mysql/mysql.sock
pid-file                       = /var/lib/mysql/mysql.pid
tmpdir                         = /tmp
explicit_defaults_for_timestamp= 1
bind-address                   = 0.0.0.0
skip-name-resolve              = ON

max-connect-errors             = 1000000
sysdate-is-now                 = 1

datadir                        = /var/lib/mysql

expire-logs-days               = 4
binlog-format                  = ROW
log_slave_updates
gtid-mode                      = ON
enforce-gtid-consistency       = ON

tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
thread-cache-size              = 128
open-files-limit               = 65535
table-definition-cache         = 1024
max_allowed_packet             = 1G
net_buffer_length              = 1M
thread_cache_size              = 6000

innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1
innodb_page_cleaners           = 48

log-error                      = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log-file            = /var/lib/mysql/mysql-slow.log
master_info_repository          = TABLE
server-id                       = 1197145515
report-host                     = 192.168.33.10
local_infile                    = OFF
automatic_sp_privileges         = FALSE
old_passwords                   = 2

innodb-max-dirty-pages-pct=75
read_only = 1
super_read_only = 0
skip_ssl

log-bin                        = mysql-bin
max_binlog_size                = 1G
max-connections                = 5000
table_open_cache               = 10000
innodb_adaptive_hash_index     = 0
long_query_time                = 0

innodb_purge_threads           = 8
innodb_write_io_threads        = 32
innodb_read_io_threads         = 32
innodb_flush_neighbors         = 0
innodb_thread_concurrency      = 32

log_output=file
slow_query_log=ON
long_query_time=1
innodb_monitor_enable=all
max_user_connections = 5000
max_connections = 5050
thread_cache_size = 4000
innodb_autoinc_lock_mode = 2
innodb_sync_array_size = 1
innodb_open_files = 2000
skip_symbolic_links
table_open_cache_instances = 4

sync_binlog = 0
slave_net_timeout=250
innodb_lru_scan_depth           = 4096

innodb-io-capacity             = 1000
innodb-io-capacity-max         = 2000
innodb_flushing_avg_loops      = 200
relay-log-space-limit           = 1G

slave_rows_search_algorithms="INDEX_SCAN,HASH_SCAN"
binlog_group_commit_sync_no_delay_count = 8
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=1
slave_parallel_workers = 4

=== Replica my.cnf ===
[mysql]
port                           = 3306
socket                         = /var/lib/mysql/mysql.sock

[mysqld]
collation-server               = utf8_unicode_ci
character-set-server           = utf8
sql-mode                       = "STRICT_ALL_TABLES,STRICT_TRANS_TABLES"

user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/lib/mysql/mysql.sock
pid-file                       = /var/lib/mysql/mysql.pid
tmpdir                         = /tmp
explicit_defaults_for_timestamp= 1
bind-address                   = 0.0.0.0
skip-name-resolve              = ON

max-connect-errors             = 1000000
sysdate-is-now                 = 1

datadir                        = /var/lib/mysql/

expire-logs-days               = 4
binlog-format                  = ROW
log_slave_updates
gtid-mode                      = ON
enforce-gtid-consistency       = ON

tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
thread-cache-size              = 128
open-files-limit               = 65535
table-definition-cache         = 1024
max_allowed_packet             = 1G
net_buffer_length              = 1M
thread_cache_size              = 6000

innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2

innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1

innodb_page_cleaners           = 16

log-error                      = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log-file            = /var/lib/mysql/mysql-slow.log
master_info_repository          = TABLE
server-id                       = 237023917
report-host                     = 192.168.33.20
local_infile                    = OFF
automatic_sp_privileges         = FALSE
old_passwords                   = 2

innodb-max-dirty-pages-pct=75
read_only = 1
super_read_only = 0

skip_ssl

log-bin                        = mysql-bin
max_binlog_size                = 1G
max-connections                = 5000
table_open_cache               = 10000
innodb_adaptive_hash_index     = 0
long_query_time                = 0

innodb_purge_threads           = 8
innodb_write_io_threads        = 32
innodb_read_io_threads         = 32
innodb_flush_neighbors         = 0
innodb_thread_concurrency      = 32

log_output=file
slow_query_log=ON
long_query_time=1
innodb_monitor_enable=all
max_user_connections = 5000
max_connections = 5050
thread_cache_size = 4000
innodb_autoinc_lock_mode = 2
innodb_sync_array_size = 1
innodb_open_files = 2000
skip_symbolic_links
table_open_cache_instances = 4

sync_binlog = 0
slave_net_timeout=250
innodb_lru_scan_depth           = 4096

innodb-io-capacity             = 1000
innodb-io-capacity-max         = 2000
innodb_flushing_avg_loops      = 200
relay_log_purge = OFF
innodb_buffer_pool_instances   = 8

slave_rows_search_algorithms="INDEX_SCAN,HASH_SCAN"
binlog_group_commit_sync_no_delay_count = 8
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=1
slave_parallel_workers = 4
[19 Feb 2021 7:34] Venkatesh Prasad Venugopal
Problem
-------
The replication applier thread in a multi-threaded replica can randomly get stuck in

Waiting until MASTER_DELAY seconds after master executed event
and doesn't apply any transaction for a long time when the replica is configured with MASTER_DELAY.

Background
----------
After the bugfix of BUG#17326020, when the receiver thread reconnects to its source using GTID auto positioning while in the middle of a transaction, it leaves the partial transaction on the relaylog as is and will fully retrieve the same transaction again. While doing so, the receiver thread, as per GTID auto-position protocol will write master's FORMAT_DESCRIPTION_EVENT with log_pos != 0 into the relay log after each (re)connection.

 

This FORMAT_DESCRIPTION_EVENT shall look like below in the relay log.

===BEGIN===
# at 4
#210215 16:18:05 server id 1 end_log_pos 123 CRC32 0x1120fc25 Start: binlog v 4, server v 5.7.32-35-debug-log created 210215 16:18:05
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at 123
#210215 16:18:05 server id 1 end_log_pos 154 CRC32 0xa29c149c Previous-GTIDs
# [empty]
# at 154
#700101 5:30:00 server id 1111 end_log_pos 0 CRC32 0x93c56c74 Rotate to mysql-bin.000001 pos: 4
# at 201
#210215 14:19:18 server id 1111 end_log_pos 123 CRC32 0x25802c3f Start: binlog v 4, server v 5.7.32-35-debug-log created 210215 14:19:18 at startup
ROLLBACK/*!*/;
BINLOG '
jjUqYA9XBAAAdwAAAHsAAAAAAAQANS43LjMyLTM1LWRlYnVnLWxvZwAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACONSpgEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AT8sgCU=
'/*!*/;
# at 320
#210215 16:18:05 server id 0 end_log_pos 367 CRC32 0xdd12fd05 Rotate to mysql-bin.000001 pos: 154
# at 367
===BEGIN===
 

When the applier thread starts applying the FORMAT_DESCRIPTION_EVENT (at position 201), it checks if it is in the middle of a group. If it is, it will queue to the current worker a QUERY(ROLLBACK) event to make the worker gracefully finish its work before starting to apply the new (complete) copy of the group, and only after that it will let the MTS coordinator to apply the ROTATE_LOG_EVENT (at position 320) in synchronous mode.

Analysis
--------

In our tests, it was observed that the replication applier thread got stuck when it was applying the Query_log_event (ROLLBACK) as part of handling the partial relay log event. On further investigation, it was found that the Query_log_event (for BEGIN/ROLLBACK) created by the applier thread had the current time as its timestamp. As a result, the delay function `slave_sleep()` slept for MASTER_DELAY seconds since the event timestamp was less than the evaluated current timestamp.

Suggested Fix
-------------
Make the BEGIN/ROLLBACK query to execute with same timestamp as that of the master's FORMAT_DESCRIPTION_EVENT in `coord_handle_partial_binlogged_transaction()` function.
[19 Feb 2021 7:35] Venkatesh Prasad Venugopal
Patch based on 5.7.33

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-Bug-102647-Replica-sql_thread-getting-stuck-due-to-M.patch (text/x-patch), 14.31 KiB.

[19 Feb 2021 7:48] Venkatesh Prasad Venugopal
Steps to Reproduce:

Using mysqld debug binary, start two 5.7.33 GTID enabled servers using MTR

./mtr --suite=rpl --start --mem --mysqld=--gtid-mode=on --mysqld=--enforce-gtid-consistency=on --mysqld=--log-slave-updates --mysqld=--slave-parallel-workers=4

Replica:
CHANGE MASTER TO master_host='127.0.0.1', master_port=13000, master_user='root', master_auto_position=1;
CHANGE MASTER TO MASTER_DELAY=50;
SET GLOBAL DEBUG='d,stop_io_after_reading_gtid_log_event';
START SLAVE;

Source:
CREATE TABLE t1(i INT) ENGINE=InnoDB;

As soon as you execute the above CREATE TABLE query, we can see that the receiver thread on the replica server goes down and the applier thread starts executing the event and shall wait for 50 seconds.
Once SQL thread reaches back to "Slave has read all relay log; waiting for more updates" state, now execute START SLAVE IO_THREAD on the replica server.

mysql> start slave io_thread; select sleep(1); show slave status\G
Query OK, 0 rows affected (0.00 sec)

+----------+
| sleep(1) |
+----------+
|        0 |
+----------+
1 row in set (1.00 sec)

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 13000
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 394
               Relay_Log_File: slave-relay-bin.000014
                Relay_Log_Pos: 454
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
...
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 329
              Relay_Log_Space: 1091
...
        Seconds_Behind_Master: 3280
...
             Master_Server_Id: 1111
                  Master_UUID: b04f1158-6f6a-11eb-9d24-74d83e29c093
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 50
          SQL_Remaining_Delay: 29
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event
           Master_Retry_Count: 10
...
           Retrieved_Gtid_Set: b04f1158-6f6a-11eb-9d24-74d83e29c093:1
            Executed_Gtid_Set: b04f1158-6f6a-11eb-9d24-74d83e29c093:1
                Auto_Position: 1
...
1 row in set (0.00 sec)

We can now see that the SQL thread enters wait every time we execute START SLAVE IO_THREAD.
[19 Feb 2021 13:25] MySQL Verification Team
Thank you Venu, Alok for the test case and report.
Verified as described with 5.7.33 build.

regards,
Umesh
[22 Feb 2021 18:57] OCA Admin
Contribution submitted via Github - Bug #102647: Replica sql_thread getting stuck due to MASTER_DELAY 
(*) Contribution by Venkatesh Prasad Venugopal (Github venkatesh-prasad-v, mysql-server/pull/325#issuecomment-781891530): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_576255657.txt (text/plain), 14.30 KiB.

[13 Jul 2023 4:32] Venkatesh Prasad Venugopal
Hi! Any updates on this?

--
Venkatesh Venugopal
[6 Oct 2023 7:42] MySQL Verification Team
Did this affect 8.0 ?
[18 Dec 2023 6:24] MySQL Verification Team
Hello Venu,

Sorry for my late response here.
We(including your former colleague Justin) tried to reproduce the issue using your test case to see if this issue affects current GA 8.0.35 but not seeing any issues i.e SQL thread shows "Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates". Could you please confirm once from your end as well? Thank you.

regards,
Umesh
[22 Jan 7:53] MySQL Verification Team
Hello Venu,

Just following up.
Could you please confirm if you are seeing the issue on 8.0 as well? Thank you.

regards,
Umesh
[22 Jan 11:35] Venkatesh Prasad Venugopal
Hi Umesh,

I confirm that this bug exists only in 5.7 and was not reproducible in 8.0. IIRC, this got fixed by one of the worklogs related to including granular timestamps in binlog.

--
Venu
[22 Jan 12:39] MySQL Verification Team
Thank you so much, Venu for the confirmation.

regards,
Umesh
[23 Jan 5:29] MySQL Verification Team
Hello Alok, Venu,

Thank you Venu for the confirmation. Since 5.7 is EOL, we won't be fixing this in 5.7. Closing this report for now(as reported problem not seen on 8.0). Thank you both of you for all the help.

regards,
Umesh