Description:
MySQL documentation related to replica_parallel_workers is conflicting. Doc https://dev.mysql.com/doc/refman/8.0/en/replication-threads.html states that a v8 MySQL multithreaded replica is "where replica_parallel_workers or slave_parallel_workers is greater than 0", with no clarification provided for minor version behavior changes.
However, https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_replica_pa... tells a different story for those using the most recent version of MySQL:
```
replica_parallel_workers enables multithreading on the replica and sets the number of applier threads for executing replication transactions in parallel. When the value is a number greater than 1, the replica is a multithreaded replica with the specified number of applier threads, plus a coordinator thread to manage them. If you are using multiple replication channels, each channel has this number of threads.
Before MySQL 8.0.27, the default for this system variable is 0, so replicas are single-threaded by default. Beginning with MySQL 8.0.27, the default is 4, so replicas are multithreaded by default.
As of MySQL 8.0.30, setting this variable to 0 is deprecated, raises a warning, and is subject to removal in a future MySQL release. For a single worker, set replica_parallel_workers to 1 instead.
```
In an environment where the replica is running v8.0.35 & the primary is running v8.0.26, and the replica is set to replica_parallel_workers=1, the replica will misidentify itself as running "parallel" replication and will generate replication errors for anonymous transactions, for example:
```
Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Anonymous_Gtid, relay-log name ./XXXXX-relay-bin.XXXXX, position XXXX which prevents execution of this event group in parallel mode. Reason: The source event is logically timestamped incorrectly
```
Here is status output from the replica:
```
show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: XXXXXXXXXX
Source_User: XXXXXXXXXX
Source_Port: XXXXXXXXXX
Connect_Retry: 60
Source_Log_File: mysql-bin-changelog.XXXXXXXXXX
Read_Source_Log_Pos: 2548XXXXXXXXXX
Relay_Log_File: XXXXXXXXXX-relay-bin.XXXXXXXXXX
Relay_Log_Pos: 30282XXXXXXXXXX
Relay_Source_Log_File: mysql-bin-changelog.XXXXXXXXXX
Replica_IO_Running: Yes
Replica_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1755
Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Anonymous_Gtid, relay-log name ./XXXXXXXXXX-relay-bin.XXXXXXXXXX, position XXXXXXXXXX which prevents execution of this event group in parallel mode. Reason: The source event is logically timestamped incorrectly..
Skip_Counter: 0
Exec_Source_Log_Pos: XXXXXXXXXX
Relay_Log_Space: 138422641521
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1755
Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Anonymous_Gtid, relay-log name ./XXXXXXXXXX-relay-bin.XXXXXXXXXX, position XXXXXXXXXX which prevents execution of this event group in parallel mode. Reason: The source event is logically timestamped incorrectly..
Replicate_Ignore_Server_Ids:
Source_Server_Id: 102118XXXXXXXXXX
Source_UUID: 649bb58c-XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State:
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 231124 06:51:02
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
```
Here's the relay log:
```
# at 279
#231124 6:37:45 server id 1021180011 end_log_pos 3028211525 CRC32 0x671b2b36 Anonymous_GTID last_committed=0 sequence_number=1 rbr_only=yes original_committed_timestamp=1700807865819811 immediate_commit_timestamp=1700807865819811 transaction_length=72714
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1700807865819811 (2023-11-24 06:37:45.819811 UTC)
# immediate_commit_timestamp=1700807865819811 (2023-11-24 06:37:45.819811 UTC)
/*!80001 SET @@session.original_commit_timestamp=1700807865819811*//*!*/;
/*!80014 SET @@session.original_server_version=80026*//*!*/;
/*!80014 SET @@session.immediate_server_version=80026*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
```
Theres a couple of things going on here. For one, note the sequence_number=1. This is not the first anonymous transaction received by the replica. I'm not entirely clear on the sequence behavior governance. However, this replica should be running with a single worker, not in parallel. The sequence number shouldn't matter, but perhaps I'm not understanding that correctly (?).
I am able to repair the situation using this sequence of commands:
```
SET @@GLOBAL.replica_parallel_workers = 0;
START REPLICA UNTIL SQL_AFTER_MTS_GAPS;
START REPLICA SQL_THREAD;
```
However, I cannot simply edit mysqld.cnf & set replica_parallel_workers = 0, because the default behavior for v8.0.37 is to raise a WARNING for this and quietly change the value from 0 to 4. According to the relevant documentation, replica_parallel_workers = 0 isn't correct. There seems to be something within MySQL that is still identifying this replica as a single worker event when replica_parallel_workers = 1.
How to repeat:
step 1. create a mysql primary where gtid_mode is set to OFF_PERMISSIVE & binlog is enabled. mysql version v8.0.26.
step 2. create a mysql replica with the following settings; all other settings default for mysql v8.0.35.
[mysqld]
ssl-ca = /usr1/certs/server.cas
ssl-cert = /usr1/certs/server.crt
ssl-key = /usr1/certs/server.key
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /usr1/mysql
gtid_mode = OFF
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
read_only = ON
tmpdir = /usr1/tmp
bind-address = 0.0.0.0
mysqlx-bind-address = 0.0.0.0
innodb_file_per_table = ON
key_buffer_size = 16M
innodb_buffer_pool_size = 48G
max_allowed_packet = 512M
innodb_log_file_size = 2G
innodb_log_buffer_size = 1G
innodb_stats_on_metadata = OFF
back_log = 512
innodb_io_capacity_max = 10000
wait_timeout = 172800
max_connections = 1000
max_connect_errors = 1000
connect_timeout = 180
replica_net_timeout = 180
net_read_timeout = 600
net_write_timeout = 600
thread_stack = 256K
sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
event_scheduler = OFF
log_error = /var/log/mysql/error.log
log_error_verbosity = 3
server-id = 5555
binlog_format = ROW
binlog_expire_logs_seconds = 604800
performance_schema = OFF
replica_parallel_workers = 1
relay_log_recovery = ON
step 3. start replication:
CHANGE REPLICATION SOURCE TO \
SOURCE_HOST='XXXX', \
SOURCE_USER='XXXX', SOURCE_PASSWORD='XXXX', \
SOURCE_LOG_FILE='mysql-bin-changelog.XXXXXX', SOURCE_LOG_POS=XXXXXXX;
START REPLICATION;
Step 4. generate an anonymous_gtid event with an out-of-order sequence_number. I'm not sure how to do this or why our local env does this on occasion. this will break replication.
Suggested fix:
https://dev.mysql.com/doc/refman/8.0/en/replication-threads.html should be updated ASAP to make it clear that there are possibly breaking changes in the minor 8.0.x versions for replica_parallel_workers.
Its possible this could be resolved with matching minor versions for both source & replica, but there are reasonable circumstances where minor versions might be at least temporarily different. Ideally, DBAs would have a clear course of action for avoiding replication failure in those circumstances.
But I'm not sure why the primary's version or parallelism should be relevant. mysql replicas should be able to properly recognize their own parallel state, right?