Bug #74385 GTID replication randomly skipping transactions
Submitted: 14 Oct 2014 18:24 Modified: 25 May 2018 9:52
Reporter: Van Stokes Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.21 x64 OS:Any (Ubuntu 14.04.1 x64)
Assigned to: CPU Architecture:Any
Tags: replication, server

[14 Oct 2014 18:24] Van Stokes
Description:
All servers are running MySQL 5.6.21 x64 on Ubuntu.
No slaves running parallel threads.
Row based replication

For reasons we cannot figure out, sometimes a server will just not execute a transaction it has retrieved. 

We have a circular (loop) replication of masters:
MasterA -> MasterB -> MasterC -> MasterD -> MasterA

Slaves:
MasterA -> SlaveA
MasterD -> SlaveD

MasterD binlog entry:

#141011 16:54:46 server id 1003  end_log_pos 934490 CRC32 0xc234b4f8  GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '69cf02cd-1731-11e3-9a19-002590854928:84680970'/*!*/;
# at 934490
#141011 16:54:46 server id 1003  end_log_pos 934592 CRC32 0xbae2ae0b  Query thread_id=38  exec_time=65281 error_code=0
SET TIMESTAMP=1413060886/*!*/;
SET @@session.sql_mode=1075838976/*!*/;
SET @@session.auto_increment_increment=4, @@session.auto_increment_offset=3/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
DELETE FROM `wiki_it`.`hitcounter`
/*!*/;
# at 934592
#141012 11:02:28 server id 1001  end_log_pos 934640 CRC32 0xa2598f0b  GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '819c985c-d384-11e3-a621-00259002979a:9109192'/*!*/;
# at 934640

MasterD Executed GTIDs
(MasterB) 69cf02cd-1731-11e3-9a19-002590854928:1-88715639,
(MasterA) 708bb615-d393-11e3-a682-003048c3ab22:1-30841337,
(MasterC) 819c985c-d384-11e3-a621-00259002979a:1-9433702,
(MasterD) 9204e764-d379-11e3-a5d9-0013726268ea:1-5399

SlaveD Retrieved GTIDs
69cf02cd-1731-11e3-9a19-002590854928:84680970:87340384-88500015,
708bb615-d393-11e3-a682-003048c3ab22:29784388-30731708,
819c985c-d384-11e3-a621-00259002979a:9272340-9419533,
9204e764-d379-11e3-a5d9-0013726268ea:2607-4060

SlaveD Executed GTIDs
69cf02cd-1731-11e3-9a19-002590854928:1-84680969:84680971-88499824,
708bb615-d393-11e3-a682-003048c3ab22:1-30731704,
78ae4d94-d37a-11e3-a5df-005056a25fd0:1-2,
819c985c-d384-11e3-a621-00259002979a:1-9419255,
9204e764-d379-11e3-a5d9-0013726268ea:1-4043

MasterA Retrieved GTIDs
69cf02cd-1731-11e3-9a19-002590854928:84680970:85783439-88716105,
708bb615-d393-11e3-a682-003048c3ab22:30732392,
819c985c-d384-11e3-a621-00259002979a:9109168-9433949,
9204e764-d379-11e3-a5d9-0013726268ea:544-5419

MasterA Executed GTIDs
69cf02cd-1731-11e3-9a19-002590854928:1-84680969:84680971-88716105,
708bb615-d393-11e3-a682-003048c3ab22:1-30841433,
819c985c-d384-11e3-a621-00259002979a:1-9433949,
9204e764-d379-11e3-a5d9-0013726268ea:1-5419

How to repeat:
Note that MasterD exeucted the transaction.
(MasterB) 69cf02cd-1731-11e3-9a19-002590854928:1-88715639,

Note that SlaveD retrieved the transaction:
69cf02cd-1731-11e3-9a19-002590854928:84680970:87340384-88500015

Note that SlaveD did NOT execute the transaction:
69cf02cd-1731-11e3-9a19-002590854928:1-84680969:84680971-88499824

Note that MasterA retrieved the transaction:
69cf02cd-1731-11e3-9a19-002590854928:84680970:85783439-88716105,

Note that MasterA did NOT execute the transaction:
69cf02cd-1731-11e3-9a19-002590854928:1-84680969:84680971-88716105,

Suggested fix:
We are not sure why the slave will/has not execute the transaction.
[14 Oct 2014 19:22] Van Stokes
We have tried this step to see if we can force the server to execute the missing GTID:

On server missing the GTID:

STOP SLAVE;
RESET SLAVE;
START SLAVE;

This did NOT work.
The SLAVE_IO retrieves the missing GTID from the master but will not execute it.
[15 Oct 2014 11:00] Van Stokes
FYI, restarting the MySQL server (application) with 

relay-log-recovery = 1

in the my.cnf appears to correct the issue.

Why must the MySQL server (i.e. the application) be restarted to correct this issue!?
[15 Oct 2014 11:28] Van Stokes
The implications of this issue is extremely serious. Did you notice that the skipped transaction was a DELETE? Now, AFTER many possible OTHER table changes have already been applied the OLD statement was applied (cart before the horse!). The result is that table on the servers are now NOT in sync because of the order the application of the transactions. MasterA and SlaveD are NOT in sync with MasterB, MasterC and MasterD.

I see a massive flaw in the GTID concept. It's missing an important element. A database-table descriptor.

Current GTID schema: Server-UUID:Transaction-ID
Ex: 69cf02cd-1731-11e3-9a19-002590854928:84680970

Suggested new GTID schema: Server-UUID:Database-ID:Table-ID:Transaction-ID

Or, skip the whole server concept and bring it to the table level, assign a UUID per table. 

Suggested alternative GTID schema: Table-UUID:transaction-id

This way if a transaction is missing no further updates to the table are processed until the missing transaction is recovered. However, add a new option like slave-proceed-on-missing-gtid = 1.
[4 Nov 2014 16:20] Van Stokes
May be related to this bug report: http://bugs.mysql.com/bug.php?id=74687
[25 Apr 2018 9:52] MySQL Verification Team
Please check for related issue: https://bugs.mysql.com/bug.php?id=71529. Thanks.
[26 May 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".