Bug #71618 Improve logging of RBR / GTID replication failure
Submitted: 7 Feb 2014 9:17 Modified: 2 Dec 2014 21:22
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:5.6.16 OS:Any
Assigned to: CPU Architecture:Any
Tags: GTID, RBR, replication, windmill
Triage: Needs Triage: D5 (Feature request)

[7 Feb 2014 9:17] Simon Mudd
A recent issue on a RBR, GTID based slave gives me this error message:

2014-02-07 10:00:07 16467 [ERROR] Slave SQL: Could not execute Update_rows event on table some_table_name; Can't find record in 'events', Error_code: 1032; 
                handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.001253, end_log_pos 1064770575, Error_code: 1032
2014-02-07 10:00:07 16467 [Warning] Slave: Can't find record in 'events' Error_code: 1032

The information to help me recover from this is really lacking.

How to repeat:
See above.
The cause: the slave crashed due to bug#69075 and due to "performance problems" on the server I had temporarily set sync_binlog=0, innodb_flush_log_at_trx_commit = 2 (Yes, I'm aware of the consequences of doing this) and also I also had binlog_row_image = minimal.

So some sort of breakage was expected.

Suggested fix:
1. Given the error it is extremely helpful to also show the failed RBR update, that is the PK for the failed Update, as that allows me to locate the "broken row" quickly.

2. Given I am using GTID_MODE = ON, it is also extremely helpful to tell me the gtid that is being executed that has failed, so there is a record of this.  This allows me more quickly to locate the full entry in the relay logs (locally) or the binlog of the master (remotely).

3. This also goes back to bug#54250 reported 4 years ago, which I think is just as valid as it was when I wrote it. I think there are not that many feature requests like this because not many people use GTID, or at least in environments where enough breakages happen for these things to matter. So please consider implementing bug#52450.
[7 Feb 2014 10:30] Simon Mudd
4. Also record the equivalent relay log pos positions as it's much easier looking on the local server for information about events than having to log into the master and find that info there.
5. Why are the end_log_pos values shown? Please also indicate the start of the event that caused the problem. Other logging usually shows the start positions, and it saves me having to look back to find the beginning of the event.
[2 Dec 2014 21:22] Sveta Smirnova
Thank you for the reasonable feature request.
[12 Jun 2015 6:34] Daniƫl van Eeden
If binlog_rows_query_log_events is enabled it might also show the RowsQueryEvent text