Bug #90001 GTID: Replication failure messages do not indicate the failed uuid:<sequence>
Submitted: 10 Mar 2018 9:13 Modified: 16 Mar 2018 8:57
Reporter: Simon Mudd (OCA) Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:5.7.21 OS:Any
Assigned to: CPU Architecture:Any
Tags: auto-repair, GTID, gtid_next, show slave status, UUID

[10 Mar 2018 9:13] Simon Mudd
Description:
If using GTID and you get a replication error there is no logging of the GTID_NEXT value (the one that is causing the issue), only binlog file and position are logged.

I recently saw a duplicate key error.
If the data is already there I might want to skip this GTID value and move to the next. There are instructions on https://www.percona.com/blog/2013/03/26/repair-mysql-5-6-gtid-replication-by-injecting-emp... on how to do this but the process is manual.

It is also more complicated if your gtid_executed has multiple uuid:<ranges> given the master may have been modified several times previously.

If I want to "repair this" that is skip this single transaction I can't I need to know the gtid_next value to inject on the server so that mysqld thinks it wants to apply. That value is not present in the error message or the output of SHOW SLAVE STATUS, and @@gtid_next is automatic.

I'd like to be able to do everything from the mysql SQL prompt but am not sure how I can do this with the information provided.

How to repeat:
* make a master slave using GTID
* add a value to the slave first then add it on the master
* this will trigger a duplicate key error
* try to skip the duplicate key error this using the information available on the broken server itself (via the SQL interface)

Suggested fix:
Ideas:
* make sql_slave_skip_counter work with GTID.   mysqld has the information to do this. It just requires a bit more magic.  This facility is well known and quite useful.
* add information on the UUID:SEQUENCE_NUMBER (gtid_next?) which is causing the issue (log it and make it visible in SHOW SLAVE STATUS output)
* ensure that the process can be automated only using the SQL interface.

This reminds me of a feature request I made in 2010: See: bug#54250 (though that didn't refer to GTID) which is still not out of place even 8 years later.
[10 Mar 2018 9:17] Simon Mudd
The reason for mentioning the other FR is that it was suggested to me to change slave_exec_mode to 'IDEMPOTENT' and skip the issues that way.

Doing so works and leaves a trace in the log files but no counters and can't be done for a fixed number of transactions/statements, whereas the suggested auto-fix mode would include both counters and also not do unsafe changes, so be more trustworthy.
[10 Mar 2018 10:50] Simon Mudd
Another suggestion which sounds possible is to use SHOW SLAVE STATUS to get the relaylog file/position and then to use SHOW RELAYLOG EVENTS at that position to get the gtid_next value.  This seems like a work around which I can use now.

All of this is to find the GTID_NEXT value which is working around that this value is not presented to the user. (Using GTID is supposed to be easier than managing file/positions).
[16 Mar 2018 4:59] MySQL Verification Team
Hello Simon,

Thank you for the report and feature request!

Regards,
Umesh
[16 Mar 2018 8:57] Sven Sandberg
Posted by developer:
 
The GTID of the transaction that generated the error is exposed in
performance_schema.replication_applier_status_by_worker, column LAST_SEEN_TRANSACTION.
In 8.0, it is in the same table, column APPLYING_TRANSACTION.

So you can execute the empty transaction you need using something like this:

SELECT LAST_SEEN_TRANSACTION FROM performance_schema.replication_applier_status_by_worker WHERE LAST_ERROR_NUMBER != 0 LIMIT 1 INTO @error_gtid;
SET @@session.gtid_next = @error_gtid;
COMMIT;
SET @@session.gtid_next = 'AUTOMATIC';