Description:
consider the case where you are using sql_after_mts_gaps, for example, restoring a backup from mts as a slave or using binlog based PITR.
the server will init a relay log automatically, if none is provided. then, normally, the user would copy the binlog/relay log(s) from the backup as shown in slave_worker_info, and run start slave until sql_after_mts_gaps.
however, the slave fails to update relay log info (with TABLE mysql.slave_relay_log_info) with the relay log file/position from the sync point. Therefore, a subsequent "start slave" which would assume everything is in sync actually uses the old values (usually the start of the log file).
mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+------------------------------------------------+---------------+-------------------+----------------+
| id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+------------------------------------------------+---------------+-------------------+----------------+
| 1 | /*/mysqld-relay-bin.000114 | 649715248 | mysqld-bin.000036 | 649715084 |
| 2 | /*/mysqld-relay-bin.000114 | 651227473 | mysqld-bin.000036 | 651227309 |
| 3 | /*/mysqld-relay-bin.000114 | 651226653 | mysqld-bin.000036 | 651226489 |
+----+------------------------------------------------+---------------+-------------------+----------------+
3 rows in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: master.host
Master_User: redacting_is_fun
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysqld-bin.000036
Read_Master_Log_Pos: 651227309
Relay_Log_File: mysqld-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysqld-bin.000036
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 651227309
Relay_Log_Space: 120
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d
Master_Info_File: /*/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)
mysql> START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+----------------+---------------+-----------------+----------------+
| id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+----------------+---------------+-----------------+----------------+
| 1 | | 0 | | 0 |
| 2 | | 0 | | 0 |
| 3 | | 0 | | 0 |
+----+----------------+---------------+-----------------+----------------+
3 rows in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Queueing master event to the relay log
Master_Host: master.host
Master_User: redacting_is_fun
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysqld-bin.000036
Read_Master_Log_Pos: 778970368
Relay_Log_File: mysqld-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysqld-bin.000036
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 651227309
Relay_Log_Space: 127743517
Until_Condition: SQL_AFTER_MTS_GAPS
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 181593830
Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d
Master_Info_File: /*/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)
OH NO! Relay_Log_Pos: 4 !
How to repeat:
take a physical backup from a server where the sql worker threads are not synced, like enterprise backup or xtrabackup. offline rsync should have the same behavior, though that is uncommon.
restore the backup on a fresh server, allow mysql to init logs normally
check mysql.slave_worker_info to see what files you need, copy them into the data dir
-- IF using binlogs instead of relay logs: update mysql.slave_worker_info set relay_log_name='file you put there, copy of master_log_name',relay_log_pos=master_log_pos;
start slave sql_thread until sql_after_mts_gaps;
-- check status from show slave or mysql.slave tables
start slave;
-- oops! broken now. trying to reprocess the entire log file from the start.
Suggested fix:
a workaround is to get the *master* log file/exec_pos and run a CHANGE MASTER TO based on that or use it as a start point for PITR. granted, with PITR, that's not a big deal; but setting up a slave is a much more common task, and when you'd assume everything is in sync, it's easy to break.
Fix would be either:
- update checkpointing to do this and force at stop time from the worker threads
- add the logic to the stop condition check (https://github.com/mysql/mysql-server/blob/5.6/sql/rpl_rli.cc#L1349)
Description: consider the case where you are using sql_after_mts_gaps, for example, restoring a backup from mts as a slave or using binlog based PITR. the server will init a relay log automatically, if none is provided. then, normally, the user would copy the binlog/relay log(s) from the backup as shown in slave_worker_info, and run start slave until sql_after_mts_gaps. however, the slave fails to update relay log info (with TABLE mysql.slave_relay_log_info) with the relay log file/position from the sync point. Therefore, a subsequent "start slave" which would assume everything is in sync actually uses the old values (usually the start of the log file). mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info; +----+------------------------------------------------+---------------+-------------------+----------------+ | id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos | +----+------------------------------------------------+---------------+-------------------+----------------+ | 1 | /*/mysqld-relay-bin.000114 | 649715248 | mysqld-bin.000036 | 649715084 | | 2 | /*/mysqld-relay-bin.000114 | 651227473 | mysqld-bin.000036 | 651227309 | | 3 | /*/mysqld-relay-bin.000114 | 651226653 | mysqld-bin.000036 | 651226489 | +----+------------------------------------------------+---------------+-------------------+----------------+ 3 rows in set (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: master.host Master_User: redacting_is_fun Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysqld-bin.000036 Read_Master_Log_Pos: 651227309 Relay_Log_File: mysqld-relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: mysqld-bin.000036 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 651227309 Relay_Log_Space: 120 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d Master_Info_File: /*/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec) mysql> START SLAVE UNTIL SQL_AFTER_MTS_GAPS; Query OK, 0 rows affected, 1 warning (0.01 sec) mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info; +----+----------------+---------------+-----------------+----------------+ | id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos | +----+----------------+---------------+-----------------+----------------+ | 1 | | 0 | | 0 | | 2 | | 0 | | 0 | | 3 | | 0 | | 0 | +----+----------------+---------------+-----------------+----------------+ 3 rows in set (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Queueing master event to the relay log Master_Host: master.host Master_User: redacting_is_fun Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysqld-bin.000036 Read_Master_Log_Pos: 778970368 Relay_Log_File: mysqld-relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: mysqld-bin.000036 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 651227309 Relay_Log_Space: 127743517 Until_Condition: SQL_AFTER_MTS_GAPS Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 181593830 Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d Master_Info_File: /*/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec) OH NO! Relay_Log_Pos: 4 ! How to repeat: take a physical backup from a server where the sql worker threads are not synced, like enterprise backup or xtrabackup. offline rsync should have the same behavior, though that is uncommon. restore the backup on a fresh server, allow mysql to init logs normally check mysql.slave_worker_info to see what files you need, copy them into the data dir -- IF using binlogs instead of relay logs: update mysql.slave_worker_info set relay_log_name='file you put there, copy of master_log_name',relay_log_pos=master_log_pos; start slave sql_thread until sql_after_mts_gaps; -- check status from show slave or mysql.slave tables start slave; -- oops! broken now. trying to reprocess the entire log file from the start. Suggested fix: a workaround is to get the *master* log file/exec_pos and run a CHANGE MASTER TO based on that or use it as a start point for PITR. granted, with PITR, that's not a big deal; but setting up a slave is a much more common task, and when you'd assume everything is in sync, it's easy to break. Fix would be either: - update checkpointing to do this and force at stop time from the worker threads - add the logic to the stop condition check (https://github.com/mysql/mysql-server/blob/5.6/sql/rpl_rli.cc#L1349)