Description:
consider the case where you are using sql_after_mts_gaps, for example, restoring a backup from mts as a slave or using binlog based PITR.
the server will init a relay log automatically, if none is provided. then, normally, the user would copy the binlog/relay log(s) from the backup as shown in slave_worker_info, and run start slave until sql_after_mts_gaps.
however, the slave fails to update relay log info (with TABLE mysql.slave_relay_log_info) with the relay log file/position from the sync point. Therefore, a subsequent "start slave" which would assume everything is in sync actually uses the old values (usually the start of the log file).
mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+------------------------------------------------+---------------+-------------------+----------------+
| id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+------------------------------------------------+---------------+-------------------+----------------+
| 1 | /*/mysqld-relay-bin.000114 | 649715248 | mysqld-bin.000036 | 649715084 |
| 2 | /*/mysqld-relay-bin.000114 | 651227473 | mysqld-bin.000036 | 651227309 |
| 3 | /*/mysqld-relay-bin.000114 | 651226653 | mysqld-bin.000036 | 651226489 |
+----+------------------------------------------------+---------------+-------------------+----------------+
3 rows in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: master.host
Master_User: redacting_is_fun
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysqld-bin.000036
Read_Master_Log_Pos: 651227309
Relay_Log_File: mysqld-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysqld-bin.000036
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 651227309
Relay_Log_Space: 120
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d
Master_Info_File: /*/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)
mysql> START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+----------------+---------------+-----------------+----------------+
| id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+----------------+---------------+-----------------+----------------+
| 1 | | 0 | | 0 |
| 2 | | 0 | | 0 |
| 3 | | 0 | | 0 |
+----+----------------+---------------+-----------------+----------------+
3 rows in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Queueing master event to the relay log
Master_Host: master.host
Master_User: redacting_is_fun
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysqld-bin.000036
Read_Master_Log_Pos: 778970368
Relay_Log_File: mysqld-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysqld-bin.000036
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 651227309
Relay_Log_Space: 127743517
Until_Condition: SQL_AFTER_MTS_GAPS
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 181593830
Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d
Master_Info_File: /*/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)
OH NO! Relay_Log_Pos: 4 !
How to repeat:
take a physical backup from a server where the sql worker threads are not synced, like enterprise backup or xtrabackup. offline rsync should have the same behavior, though that is uncommon.
restore the backup on a fresh server, allow mysql to init logs normally
check mysql.slave_worker_info to see what files you need, copy them into the data dir
-- IF using binlogs instead of relay logs: update mysql.slave_worker_info set relay_log_name='file you put there, copy of master_log_name',relay_log_pos=master_log_pos;
start slave sql_thread until sql_after_mts_gaps;
-- check status from show slave or mysql.slave tables
start slave;
-- oops! broken now. trying to reprocess the entire log file from the start.
Suggested fix:
a workaround is to get the *master* log file/exec_pos and run a CHANGE MASTER TO based on that or use it as a start point for PITR. granted, with PITR, that's not a big deal; but setting up a slave is a much more common task, and when you'd assume everything is in sync, it's easy to break.
Fix would be either:
- update checkpointing to do this and force at stop time from the worker threads
- add the logic to the stop condition check (https://github.com/mysql/mysql-server/blob/5.6/sql/rpl_rli.cc#L1349)