Bug #115811 Replication Delay consistent at 1 second.
Submitted: 9 Aug 2024 18:59 Modified: 12 Aug 2024 19:51
Reporter: Gabe Tucker Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.35-27 OS:Ubuntu (Only Tested on Ubuntu)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: replication performance

[9 Aug 2024 18:59] Gabe Tucker
Description:
There are a few issues around the documentation and function of replica_parallel_workers and replica_parallel_type.

We found these issues when replicating from a 5.7.37-40 Master to a 8.0.35-27 Replica.  The instance is using only one database.

We were experiencing a replication delay where seconds_behind_master would fluctuate between 0 and 1 second (mostly at 1 second).  This master has other replicas, that are in 5.7.37-40.  None of them had any delay.

We were able to correct this issue with two changes.  We set replica_parallel_workers to 0 and replica_parallel_type to DATABASE.

A few things to note.  Quotes are taking from this link: https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html

- replica_preserve_commit_order is ON.  Which, according to your documentation, is not possible.
"When replica_preserve_commit_order or slave_preserve_commit_order is enabled, you must use LOGICAL_CLOCK. Before MySQL 8.0.27, DATABASE is the default. From MySQL 8.0.27, multithreading is enabled by default for replica servers (replica_parallel_workers=4 by default), and LOGICAL_CLOCK is the default. (In MySQL 8.0.27 and later, replica_preserve_commit_order is also enabled by default.)"

- "As of MySQL 8.0.30, setting this variable to 0 is deprecated, raises a warning, and is subject to removal in a future MySQL release. For a single worker, set replica_parallel_workers to 1 instead."
For our case, setting replica_parallel_workers to 1 did not work.  I'd like our issue to be considered as any other setting would cause the seconds_behind_master to fluctuate between 0 and 1 second, as previously mentioned.

Here is our my.cnf for the 8.0 instance:
# MySQL 8.x Configuration File
#

[client]
socket                          = /var/lib/mysql/mysql.sock
default-auth                    = mysql_native_password

[mysqld]
default_authentication_plugin   = mysql_native_password
open_files_limit                = 65535
pid_file                        = /var/run/mysqld/mysqld.pid
datadir                         = /var/lib/mysql
tmpdir                          = /var/lib/mysqltmp
socket                          = /var/lib/mysql/mysql.sock
skip-name-resolve
sql_mode                        = NO_UNSIGNED_SUBTRACTION,ALLOW_INVALID_DATES
default-time-zone               = '+00:00'
explicit_defaults_for_timestamp = ON
event_scheduler                 = ON
user                            = mysql

## Cache
table_open_cache                = 5120
table_definition_cache          = -1

## Per-thread Buffers
sort_buffer_size                = 256K
read_buffer_size                = 128K
read_rnd_buffer_size            = 256K
join_buffer_size                = 256K

## Temp Tables
tmp_table_size                  = 128M
max_heap_table_size             = 128M

## Networking
back_log                        = 100
max_connections                 = 2000
max_connect_errors              = 10000
max_allowed_packet              = 64M
interactive_timeout             = 600
wait_timeout                    = 600

## InnoDB
innodb_dedicated_server         = 1
innodb_flush_method             = O_DIRECT
innodb_old_blocks_pct           = 15
innodb_numa_interleave          = 1
innodb_cleaner_lsn_age_factor   = high_checkpoint
innodb_read_ahead_threshold     = 0
innodb_io_capacity              = 900
innodb_log_write_ahead_size     = 4096
innodb_flush_log_at_trx_commit  = 2
innodb_log_buffer_size          = 512M

## Transactions
transaction_isolation           = READ-COMMITTED

## Logging
log_output                      = FILE
log-error                       = /var/log/mysqld.log

# log_error_verbosity values:
log_error_verbosity             = 2

# MY-013360 - Warning that repeats every few seconds
# [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password'
# is deprecated and will be removed in a future release. Please use caching_sha2_password instead'
log_error_suppression_list      = MY-013360

slow-query-log                  = 1
slow_query_log_file             = /var/lib/mysqllogs/slow-log
log_slow_replica_statements     = 1
long_query_time                 = 1
general_log_file                = /var/lib/mysqllogs/benet-prod-ue1-mysql-count80-write-1.log
general-log                     = OFF

## Replication
server_id                       = 172660940
log-bin                         = /var/lib/mysqllogs/benet-prod-ue1-mysql-count80-write-1-bin-log
relay-log                       = /var/lib/mysqllogs/benet-prod-ue1-mysql-count80-write-1-relay-log
relay_log_space_limit           = 16G
binlog_expire_logs_seconds      = 345600
read_only                       = OFF
log_replica_updates             = 1
sync_binlog                     = 0
skip-replica-start              = 0
auto_increment_offset           = 1
auto_increment_increment        = 2
relay_log_recovery              = ON
gtid_mode                       = OFF_PERMISSIVE
enforce_gtid_consistency        = WARN
report_host                     = ip-10-74-152-204
replica_parallel_workers        = 0
replica_parallel_type           = DATABASE

# Network minimal binary log setting
binlog_row_image                = minimal

## SSL
ssl_ca=/etc/mysql-ssl/ca-cert.pem
ssl_cert=/etc/mysql-ssl/server-cert.pem
ssl_key=/etc/mysql-ssl/server-key.pem

How to repeat:
Have a 5.7.37-40 master replicate to a 8.0.35-27 with one database.  If you use the default settings for replica_parallel_workers (4) and replica_parallel_type (LOGICAL_CLOCK) - then you will see the fairly consistent 1 second delay.

Set replica_parallel_type to DATABASE and replica_parallel_workers to 0 and the delay will be gone.
[12 Aug 2024 11:07] MySQL Verification Team
Hi,

I am not sure I understand the report?
You are reporting that you have 1sec replication delay or something else?
1sec replication delay is not a bug.
[12 Aug 2024 13:36] Gabe Tucker
Hi,

I believe that there is a bug with replica_parallel_workers and replica_parallel_type.  I agree that 1 one second delay is not a bug :)

As I mentioned, we have a single database instance, 5.7.37-40 master with an 8.0.35-27 replica.

Our 5.7.37-40 replica was not experiencing any replications delays while we had consistent replication delays on our 8.0.35-27.  We checked the settings that we have on the 5.7 replicas.  They have replica_parallel_workers=0 and replica_parallel_type="DATABASE"

As setting replica_parallel_workers to 0 is deprecated and will be removed, and that we only need one replica sql thread, we set this to 1, which we expect to be one thread to apply the replication transactions.

Additionally, replica_parallel_type of DATABASE is deprecated in this versions and will be removed, we set this to "LOGICAL CLOCK"

However, with these settings, we experience constant replication delay.

When we set replica_parallel_workers=0 and replica_parallel_type="DATABASE", the 8.0.35-27 replica no longer has replication delay.

The documentation says that both replica_parallel_workers=0 and replica_parallel_type="DATABASE" are being deprecated:

"As of MySQL 8.0.30, setting this variable to 0 is deprecated, raises a warning, and is subject to removal in a future MySQL release. For a single worker, set replica_parallel_workers to 1 instead.
...
replica_parallel_type is deprecated beginning with MySQL 8.0.29, as is support for parallelization of transactions using database partitioning. Expect support for these to be removed in a future release, and for LOGICAL_CLOCK to be used exclusively thereafter."

Since the recommended settings for our single database, replica_parallel_workers=1 and replica_parallel_type="LOGICAL_CLOCK", cause us replication delays, we believe that there is a bug.

Thank You
[12 Aug 2024 15:36] MySQL Verification Team
Hi,

Problem is that difference between 0 and 1 is not something we can use as important factor as we are "rounding" here so we are talking probably between difference of 0.4 and 0.6 seconds between two configs. This is why I cannot accept this as "causing replication delay". There is always a delay in replication, it is impossible to have slave lag really 0 seconds as there is time required to send data to slave and implement the changes, maybe it will be in microseconds but it will not be 0. Since we are rounding to a whole second, 1 second is not a delay really - not a change I can call "causing replication delay".
[12 Aug 2024 17:10] Gabe Tucker
I understand your points and I feel that there is something not working properly.  If I have more time, I will work on this more to better prove my point.  Unfortunately, I do not have that time currently.

Thank You
[12 Aug 2024 19:51] MySQL Verification Team
Hi Gabe,

Please do. I tried this myself and I get 1 on both, as expected. If we were talking about 10sec and 15sec it would make sense but 1sec difference can be from 0.01sec to 1.4sec difference.

Let us know if you manage to get more data about it