Bug #72635 data inconsistencies when master has truncated binary log with GTID after crash
Submitted: 13 May 2014 18:24 Modified: 8 Dec 2014 15:34
Reporter: Santosh Praneeth Banda Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.16, 5.6.17 OS:Any
Assigned to: CPU Architecture:Any

[13 May 2014 18:24] Santosh Praneeth Banda
Master is running with GTID and sync_binlog=1, innodb_flush_log_at_trx_commit=1.
After crash it may happen that master has truncated binary log due to hardware error (raid cache failure).

Without GTID, slaves fail with error  "Error reading packet from server: Client requested master to start replication from position > file size".

With GTID slaves silently skips transactions since master re-uses same GTIDs
as that of slaves.

This cause data inconsistencies on slave and slaves may fail with duplicate key errors.

How to repeat:
see description

Suggested fix:
Avoid slaves silently skipping transactions.
[13 May 2014 18:24] Santosh Praneeth Banda
Updating severity level
[19 May 2014 9:20] Umesh Shastry
Hello Santosh,

Thank you for the bug report.
Verified as described.

[19 May 2014 9:24] Umesh Shastry
// Master/Slave with MySQL version 5.6.17

With GTID enabled - None issue reported(Slave up, even syncing new data) but observed data inconsistencies(lost those events which were truncated during crash)

Without GTID enabled - Slave's IO thread stopped with:

 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size; the first event 'master-bin.000003' at 3382, the last event read from './master-bin.000003' at 4, the last byte read from './master-bin.000003' at 4.'
[8 Dec 2014 15:34] David Moss
Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4747:
In normal usage, it is not possible for a slave to have more GTIDs than the master. But in certain situations, such as after a hardware failure or incorrectly cleared gtid_purged, the master's binary log could be truncated. This fix ensures that in such a situation, the master now detects that the slave has transactions with GTIDs which are not on the master. An error is now generated on the slave and the I/O thread is stopped with an error. The master's dump thread is also stopped. This prevents data inconsistencies during replication.
[12 Feb 2015 12:47] Laurynas Biveinis
$ git show -s 6e6add6
commit 6e6add6bb5649b6f75579c86f5a4a51e95c54fb6
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date:   Tue Nov 18 09:54:31 2014 +0530

           Master's dump thread is not detecting the case where Slave's
           gtid executed set is having more gtids than Master's gtid
           executed set with respective to Master's UUID.
          Analysis & Fix:
           In normal scenarios, it is not possible that Slave will
           contain more gtids than Master with respective to Master's UUID.
           But it could be possible case if Master's binary log is
           truncated(due to raid failure) or Master's binary log is
           deleted but GTID_PURGED was not set properly. That scenario
           needs to be validated, i.e., it should *always* be the case that
           Slave's gtid executed set (+retrieved set) is a subset of
           Master's gtid executed set with respective to Master's UUID.
           If it happens, Master's dump thread will be stopped and this
           situation will be informed to Slave during the handshake (thus.
           slave I/O thread also be stopped with an error
           (ER_MASTER_FATAL_ERROR_READING_BINLOG). Otherwise, it can lead
           to data inconsistency between Master and Slave.