Bug #81385 relay log info not updated after using sql_after_mts_gaps
Submitted: 12 May 2016 0:46 Modified: 16 Sep 2019 14:38
Reporter: Trey Raymond Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.29 OS:Any
Assigned to: CPU Architecture:Any
Tags: MTS

[12 May 2016 0:46] Trey Raymond
Description:
consider the case where you are using sql_after_mts_gaps, for example, restoring a backup from mts as a slave or using binlog based PITR.

the server will init a relay log automatically, if none is provided.  then, normally, the user would copy the binlog/relay log(s) from the backup as shown in slave_worker_info, and run start slave until sql_after_mts_gaps.

however, the slave fails to update relay log info (with TABLE mysql.slave_relay_log_info) with the relay log file/position from the sync point.  Therefore, a subsequent "start slave" which would assume everything is in sync actually uses the old values (usually the start of the log file).

mysql> select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+------------------------------------------------+---------------+-------------------+----------------+
| id | Relay_log_name                                 | Relay_log_pos | Master_log_name   | Master_log_pos |
+----+------------------------------------------------+---------------+-------------------+----------------+
|  1 | /*/mysqld-relay-bin.000114 |     649715248 | mysqld-bin.000036 |      649715084 |
|  2 | /*/mysqld-relay-bin.000114 |     651227473 | mysqld-bin.000036 |      651227309 |
|  3 | /*/mysqld-relay-bin.000114 |     651226653 | mysqld-bin.000036 |      651226489 |
+----+------------------------------------------------+---------------+-------------------+----------------+
3 rows in set (0.00 sec)
 
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: master.host
                  Master_User: redacting_is_fun
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000036
          Read_Master_Log_Pos: 651227309
               Relay_Log_File: mysqld-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysqld-bin.000036
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 651227309
              Relay_Log_Space: 120
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d
             Master_Info_File: /*/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)
 
mysql> START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
Query OK, 0 rows affected, 1 warning (0.01 sec)
 
mysql>  select id,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+----------------+---------------+-----------------+----------------+
| id | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+----------------+---------------+-----------------+----------------+
|  1 |                |             0 |                 |              0 |
|  2 |                |             0 |                 |              0 |
|  3 |                |             0 |                 |              0 |
+----+----------------+---------------+-----------------+----------------+
3 rows in set (0.00 sec)
 
mysql>  show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: master.host
                  Master_User: redacting_is_fun
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000036
          Read_Master_Log_Pos: 778970368
               Relay_Log_File: mysqld-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysqld-bin.000036
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 651227309
              Relay_Log_Space: 127743517
              Until_Condition: SQL_AFTER_MTS_GAPS
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 181593830
                  Master_UUID: 9c95cb9f-0c9b-11e6-a512-d4ae5276557d
             Master_Info_File: /*/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

OH NO! Relay_Log_Pos: 4 !

How to repeat:
take a physical backup from a server where the sql worker threads are not synced, like enterprise backup or xtrabackup.  offline rsync should have the same behavior, though that is uncommon.

restore the backup on a fresh server, allow mysql to init logs normally

check mysql.slave_worker_info to see what files you need, copy them into the data dir
-- IF using binlogs instead of relay logs: update mysql.slave_worker_info set relay_log_name='file you put there, copy of master_log_name',relay_log_pos=master_log_pos;

start slave sql_thread until sql_after_mts_gaps;

-- check status from show slave or mysql.slave tables

start slave;

-- oops!  broken now.  trying to reprocess the entire log file from the start.

Suggested fix:
a workaround is to get the *master* log file/exec_pos and run a CHANGE MASTER TO based on that or use it as a start point for PITR.  granted, with PITR, that's not a big deal; but setting up a slave is a much more common task, and when you'd assume everything is in sync, it's easy to break.
Fix would be either:
- update checkpointing to do this and force at stop time from the worker threads
- add the logic to the stop condition check (https://github.com/mysql/mysql-server/blob/5.6/sql/rpl_rli.cc#L1349)
[16 Sep 2019 14:00] MySQL Verification Team
Hi,

I'm verifying the "behavior", so yes, I can reproduce your test case, but I'll have to leave the decision on how to process it / in what way to fix it, to the dev team.
[16 Sep 2019 14:38] Trey Raymond
GTID adoption is far more widespread now than it was back in 2016, so this probably doesn't affect many people anymore - I suppose a decision on a fix would be based on if file/pos replication will continue to be supported past mysql 8
[16 Sep 2019 14:54] MySQL Verification Team
Hi,

> I suppose a decision on a fix would be based on if file/pos 
> replication will continue to be supported past mysql 8

As for supporting file/pos replication, I think it will stay with us for a while, change is mostly turning gtid on by default, and wrt this particulate bug, I guess the "fix" will be to document this behavior rather than change it.

kind regards