Bug #69811 Unable to repair replication after Lost_Events incident using GTID
Submitted: 22 Jul 2013 3:30 Modified: 24 Sep 2014 14:39
Reporter: Warren Yates Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:5.6.11-ndb-7.3.2-cluster OS:Microsoft Windows
Assigned to: David Moss CPU Architecture:Any
Tags: GTID, LOST_EVENTS, replication
Triage: Needs Triage: D2 (Serious)

[22 Jul 2013 3:30] Warren Yates
Description:
Using GTID for our replication, one of the slave servers have encountered a Lost_Events incident. Previously I have simply skipped the next transaction and everything was fine. Now using GTID's this is not an option, I've tried injecting an empty transaction however the Lost_Events incident remains.  I tried injecting many empty transactions to the point of catching up with the master position, still no luck. 
Short of reloading the data I cannot seem to fix the Lost_Events incident and resume replication? 

How to repeat:
Create a Lost_Events incident, stop slave and inject empty transaction(s). Start slave and check slave status. Lost_Events incident still remains in place.
[22 Jul 2013 7:32] Valeriy Kravchuk
This is too general problem report for now. Had you tried approaches mentioned in http://www.mysqlperformanceblog.com/2013/03/26/repair-mysql-5-6-gtid-replication-by-inject...? Any steps to reproduce your exact problem, output of show slave status\G etc?
[24 Jul 2013 0:06] Warren Yates
Thanks Valeriy, 
Thanks for the pointers however as I stated in the original post I have tried to inject an empty transaction, in fact, I've tried injecting a number of them to see if there were multiple LOST_EVENTS causing the problem.  Even when the Retrieved caught up with the Exicuted Gtid set the replication was still broken due to a LOST_EVENTS_INCEDENT.  Unfortunalty all I can say to repolicate the problem is to produce a LOST_EVENTS_INCEIDENT and then try to repair by injecting empty transactions.
Here is my SHOW_SLAVE_STATUS:
mysql> show slave  status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.4.70.201
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000006
          Read_Master_Log_Pos: 36663990
               Relay_Log_File: mysqld-relay-bin.000004
                Relay_Log_Pos: 395
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB: acesmerp,acesmerp
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1590
                   Last_Error: The incident LOST_EVENTS occured on the master. M
essage: mysqld startup
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 191
              Relay_Log_Space: 37947508
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1590
               Last_SQL_Error: The incident LOST_EVENTS occured on the master. M
essage: mysqld startup
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
                  Master_UUID: 56c22d30-e767-11e2-a5b6-02bf0a0446c8
             Master_Info_File: E:\MySQL_Data\master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 130724 09:37:36
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 56c22d30-e767-11e2-a5b6-02bf0a0446c8:1-2402
            Executed_Gtid_Set: 56c22d30-e767-11e2-a5b6-02bf0a0446c8:1-1033
                Auto_Position: 0
1 row in set (0.00 sec)
mysql> SET GTID_NEXT="56c22d30-e767-11e2-a5b6-02bf0a0446c8:1034";
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN; COMMIT;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SET GTID_NEXT="AUTOMATIC";
Query OK, 0 rows affected (0.00 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.4.70.201
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000006
          Read_Master_Log_Pos: 36663990
               Relay_Log_File: mysqld-relay-bin.000004
                Relay_Log_Pos: 395
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB: acesmerp,acesmerp
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1590
                   Last_Error: The incident LOST_EVENTS occured on the master. M
essage: mysqld startup
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 191
              Relay_Log_Space: 37947913
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1590
               Last_SQL_Error: The incident LOST_EVENTS occured on the master. M
essage: mysqld startup
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
                  Master_UUID: 56c22d30-e767-11e2-a5b6-02bf0a0446c8
             Master_Info_File: E:\MySQL_Data\master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 130724 09:56:57
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 56c22d30-e767-11e2-a5b6-02bf0a0446c8:1-2402
            Executed_Gtid_Set: 56c22d30-e767-11e2-a5b6-02bf0a0446c8:1-1034
                Auto_Position: 0
1 row in set (0.00 sec)
[19 Mar 2014 1:54] andrew lorien
This happened for me on a multi-master 5.6.13 Solaris server.  After restoring the broken master from a backup, the slaves were still unable to get past the error.  I had to restore them as well.
You can create a LOST_EVENT incident by following the steps here : http://bugs.mysql.com/bug.php?id=68892
[8 Apr 2014 7:50] andrew lorien
Workaround

What didn't work : stop slave; start slave; reset slave; reset master; set master to... ; set gtid_next ...

What did work : 
- restoring from a backup of the master (the one which caused the problem in the first place)
- SET @@GLOBAL.GTID_PURGED=[one statement past the one which caused the lost_events error].
[2 Jul 2014 15:15] Scott Noyes
The problem is that the LOST_EVENT does not have a GTID, so we can't replace it with an empty transaction. The workaround is to manually execute the next transaction, then reset replication.

Use mysqlbinlog to find the next transaction. It is probably the first transaction in the next log file after the LOST_EVENT. Copy everything up to the COMMIT for that transaction, being sure to include the SET @@SESSION.GTID_NEXT. Even if you're using ROW format, you can still run BINLOG row events in the command line client.

STOP SLAVE;
Run the transaction copied above.
The mysqlbinlog output sets the delimiter to /*!*/;, so set it back:
DELIMITER ;

SET GTID_NEXT=automatic;
RESET SLAVE;
START SLAVE;
[21 Jul 2014 17:51] Sveta Smirnova
Thank you for the report.

I set it to "Verified", because at least our documentation can be clearer.
[24 Sep 2014 14:39] David Moss
The following has been added to the documentation at this page:
http://dev.mysql.com/doc/refman/5.7/en/replication-gtids-failover.html

Restoring GTID mode slaves.  When restoring a slave in a GTID based replication setup that has encountered an error, injecting an empty transaction may not solve the problem because an event does not have a GTID.

Use mysqlbinlog to find the next transaction, which is probably the first transaction in the next log file after the event. Copy everything up to the COMMIT for that transaction, being sure to include the SET @@SESSION.GTID_NEXT. Even if you are not using row-based replication, you can still run binary log row events in the command line client.

Stop the slave and run the transaction you copied. The mysqlbinlog output sets the delimiter to /*!*/;, so set it back:

		mysql> DELIMITER ;
		

Restart replication from the correct position automatically:

		mysql> SET GTID_NEXT=automatic;	
		mysql> RESET SLAVE;	
		mysql> START SLAVE;
[3 Mar 2017 20:22] monty solomon
The text should be updated to reflect that mysqlbinlog should be used to find the next transaction in the relay logs (it says "next log file after the event").

The documentation suggests running the transaction copied from the output of mysqlbinlog but it does not warn about, or provide guidance to avoid, errant GTID transactions created by executing the copied transaction.