Bug #73074 Upgrade from 5.6.20 -> 5.6.21; Replication; 1236 Found old binary log w/o GTID
Submitted: 22 Jun 2014 10:48 Modified: 10 Dec 2016 18:34
Reporter: Van Stokes Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6.21 x64, 5.6.22 OS:Any (Windows,Linux)
Assigned to: CPU Architecture:Any
Tags: 1236, GTID, replication, upgrade

[22 Jun 2014 10:48] Van Stokes
Description:
Configuration: Four masters in circular (looped) replication.
Binlog format: mixed
OS: Ubuntu 12.04 LTS, 14.04 LTS, and Windows Server 2008

Upgraded all MySQL servers from 5.6.17 to 5.6.19. Upgrade broke replication which was fine prior to the upgrade. From the error log:

2014-06-21 18:05:43 32319 [Note] Slave I/O thread: connected to master 'xxxxxxx@yyy-mysql02.mydomain.com:3306',replication started in log 'master-bin.000336' at position 72188042
2014-06-21 18:05:44 32319 [ERROR] Error reading packet from server: Found old binary log without GTIDs while looking for the oldest binary log that contains any GTID that is not in the given gtid set ( server_errno=1236)
2014-06-21 18:05:44 32319 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Found old binary log without GTIDs while looking for the oldest binary log that contains any GTID that is not in the given gtid set', Error_code: 1236
2014-06-21 18:05:44 32319 [Note] Slave I/O thread exiting, read up to log 'master-bin.000336', position 72188042

All MySQL servers exhibit this error including end point slaves.

How to repeat:
Configure GTID replication using 5.6.17 and then upgrade to 5.6.19.

Suggested fix:
Not sure. Still investigating how to recover without performing a restore.
[22 Jun 2014 11:35] Van Stokes
I believe the problem is in sql/binlog.cc in read_gtids_from_binlog().

Here is our GTID_EXECUTED:

69cf02cd-1731-11e3-9a19-002590854928:1-55306969,
708bb615-d393-11e3-a682-003048c3ab22:1-13491133,
819c985c-d384-11e3-a621-00259002979a:1-1162440,
9204e764-d379-11e3-a5d9-0013726268ea:1-2431

9204e764-d379-11e3-a5d9-0013726268ea is the local MySQL server.

I may be mistaken as I don't have the source installed in a DEV environment to step through it but it appears to me that the logic is attempting to resolve ALL the GTID sets in the same binlog file. Therefore if a file does not contain a GTID for any of the sets (i.e. four in this case) then it fails. It never searches the other previous binlog files.

In our case, it is very possible that a binlog will NOT contain (some or all)  transactions (i.e. GTIDs). For example, this server (9204e764-d379-11e3-a5d9-0013726268ea) is located at our DR site and does not do transactions unless the site is made active. However, it remains in the replication loop to be current.
[11 Aug 2014 11:37] Van Stokes
This problem still persists. We upgraded a slave server from 5.6.19 to 5.6.20 and this error happened again. The slave was working fine and was completely sync'd with the masters prior to the upgrade. After the upgrade we get this error 1236 and it is non-recoverable. We have attempted several suggestions found on the web and none of them have worked. It appears the only solution is to dump and reload from the master.
[30 Sep 2014 15:06] Van Stokes
And the same thing happened when upgrading from 5.6.20 to 5.6.21.
READ-ONLY Slave server is failing with this error:

2014-09-30 11:02:12 12018 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './slave-relay-bin.000001' position: 4
2014-09-30 11:02:12 12018 [Note] Slave I/O thread: connected to master 'rs_2001@atl-mysql02.econocaribe.com:3306',replication started in log 'FIRST' at position 4
2014-09-30 11:02:12 12018 [ERROR] Error reading packet from server: Found old binary log without GTIDs while looking for the oldest binary log that contains any GTID that is not in the given gtid set ( server_errno=1236)
2014-09-30 11:02:12 12018 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Found old binary log without GTIDs while looking for the oldest binary log that contains any GTID that is not in the given gtid set', Error_code: 1236
2014-09-30 11:02:12 12018 [Note] Slave I/O thread exiting, read up to log 'FIRST', position 4

Here is the Executed GTID Set:

69cf02cd-1731-11e3-9a19-002590854928:1-68880629,
708bb615-d393-11e3-a682-003048c3ab22:1-17851697,
78ae4d94-d37a-11e3-a5df-005056a25fd0:1-25,
819c985c-d384-11e3-a621-00259002979a:1-7183187,
9204e764-d379-11e3-a5d9-0013726268ea:1-24

I have tried STOP SLAVE -> RESET SLAVE -> START SLAVE
and it will not start.

MASTER server (also upgraded to 5.6.21) is running fine.
[6 Nov 2014 6:25] MySQL Verification Team
Hello Van,

Thank you for the report.
I could not reproduce this issue at my end.
Could you please help us further to reproduce this issue and provide master/slave config files(pls make them private if you prefer) and exact repeatable steps?

Thanks,
Umesh
[7 Dec 2014 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[9 Dec 2014 21:36] фыв йцуйцу
I have the same issue with slave server after upgrade 5.6.20 -> 5.6.21. Slave has  SQL_Delay 129600 seconds.
[17 Jan 2015 22:39] Van Stokes
Just had the same error AGAIN after upgrading from 5.6.21 to 5.6.22.

Got fatal error 1236 from master when reading data from binary log: 'Found old binary log without GTIDs while looking for the oldest binary log that contains any GTID that is not in the given gtid set'

All slaves failed.
[19 Jan 2015 13:23] Van Stokes
Master my.cnf configuration file.

Attachment: master.my.cnf (application/octet-stream, text), 8.45 KiB.

[19 Jan 2015 13:23] Van Stokes
Slave (and master) my.cnf configuration file

Attachment: slave.my.cnf (application/octet-stream, text), 8.47 KiB.

[22 Jan 2015 21:14] Sveta Smirnova
Thank you for the report.

Have you purged, manually deleted binary logs? Have you ever switched from GTID mode to "regular" after setup it?
[23 Jan 2015 12:52] Van Stokes
No. None of the above.

All we did to perform the upgrade was:
1) stop replication (SLAVE STOP)
2) shutdown MySQL server (service mysql stop)
3) perform the upgrade (apt-get update ...)
4) start MySQL server (service mysql start)

and the error occurred on all servers.

We did try just a "SLAVE RESET" on all servers but that didn't work. We then recorded all the Executed GTIDs (per server) and performed a "SLAVE RESET ALL" followed by a CHANGE MASTER and setting the Executed GTIDs but that did not work either.

In order to "fix" the problem, we had to perform a MASTER RESET and a SLAVE RESET ALL on all servers. I shouldn't have to tell you what a catastrophic action this was.

You should be aware that we have FOUR (4) MASTER servers is circular replication (A->B->C->D->A) with each having one or more READ-ONLY slaves. All servers have the same my.cnf settings except for those settings that are server specific. 

I have a sneaking suspicion it has something to do with the consumption of GTID during the MySQL shutdown process that is not (properly?) recorded in the binary log of the MySQL server. See this bug report: 

"Server consumes a GTID on shutdown - slaves show missing executed GTID"
http://bugs.mysql.com/bug.php?id=74687

I think what happened was the MySQL server consumed a GTID but wasn't (properly?) recorded in its binary log. At start up, the slave IO thread is looking for a GTID that doesn't exist in the (first?, recent?) masters binary log and then gives up - or something to that affect. But, this could be a red herring too so I defer to your expertise.

The error message is far to ambiguous for us to trouble shoot. If possible, the error message should be modified to make it clear and easier to trouble shoot this issue. If applicable, the error message should include the server id and GTID(s) that are causing the issue.
[10 Nov 2016 18:34] MySQL Verification Team
Please check if you are getting the same issue upgrading to latest release 5.6.34. Thanks.
[11 Dec 2016 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".