Bug #81692 GTID_PURGED sequence improperly initialized on restart
Submitted: 2 Jun 2016 14:35 Modified: 3 Sep 2016 13:42
Reporter: Ryan Masse Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.12 OS:FreeBSD (10.3)
Assigned to: CPU Architecture:Any

[2 Jun 2016 14:35] Ryan Masse
Description:
When trying to create a new slave from a logical dump of the master, when the slave server is first restarted after successfully establishing a connection to the master, the value of GTID_PURGE is improperly deduced.

The logical backup contains the following:

SET @@GLOBAL.GTID_PURGED='b6ec2e13-1ca0-11e6-9911-00219b9f8f39:1-1751982';

The slave server is left to catch up with all of the pending transactions from the master.  Once caught up, if the slave server is restarted, the value of GTID_PURGED is incorrect.

In our case upon restart the value of GTID_PURGED becomes b6ec2e13-1ca0-11e6-9911-00219b9f8f39:1-1132143 as an example.  This caused the slave to fail miserably. 

We have replicated this same behavior on a number of servers when performing the below steps.

How to repeat:

On the master to create a logical backup we use:

mysqldump -u root -p --skip-lock-tables --single-transaction --hex-blob --all-databases --routines --triggers --events > all.sql

In this example all.sql contained:

SET @@GLOBAL.GTID_PURGED='b6ec2e13-1ca0-11e6-9911-00219b9f8f39:1-1751982';

On the slave we do:

RESET SLAVE;
RESET MASTER;
source 'all.sql';
START SLAVE;

The slave comes up perfectly fine with a GTID_PURGED=b6ec2e13-1ca0-11e6-9911-00219b9f8f39:1-1751982

The server is left to catch up with the master.

The server is later restarted, and upon inspection of SHOW SLAVE STATUS and inspecting the GTID_PURGED the server is trying to process transactions from a much earlier point in time that have already been applied to this slave.

We tried binlog_gtid_simple_recovery=TRUE AND FALSE with no change in the behavior.

No alterations have been made to the binary logs on the slave server during this process.

The master and slave have a near identical copy of my.cnf

The relevant settings are as follows:

[mysqld]

# BINARY LOGGING #
log-bin                         = /var/db/mysql/mysql-bin
relay-log                       = /var/db/mysql/mysql-relay-bin
expire-logs-days                = 14
sync-binlog                     = 1
server-id                       = 25460
binlog-format                   = ROW
#only send changed columns to slave. reduces volume of data in binlog
binlog-row-image                = MINIMAL
log_slave_updates               = ON
slave_parallel_workers          = 6
slave_parallel_type             = LOGICAL_CLOCK
slave_preserve_commit_order     = ON
relay_log_info_repository       = TABLE
expire-logs-days                = 7

# GTID REPLICATION #
gtid_mode                       = ON
enforce_gtid_consistency        = ON
binlog_gtid_simple_recovery     = FALSE
# GTID REPLICATION #
[9 Jun 2016 14:49] Bryan Aldridge
Having same issue here.  It seems the entries in the table mysql.gtid_executed are consulted when initializing the gtid_executed value on the slave, after a restart.  Since it looks like mysql.gtid_executed is not updated unless the server is restarted or the binary log is rotated, when a mysqldump is taken, and the mysql database is included, this information gets loaded into the slave.  Since the logical backup created by mysqldump has "SET @@SESSION.SQL_LOG_BIN= 0;
", no binary logs are created on the slave upon import.  

On Master:

mysql> show master status;
+------------------+----------+--------------+------------------+---------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                           |
+------------------+----------+--------------+------------------+---------------------------------------------+
| mysql-bin.000004 |      626 |              |                  | 95c3671b-1c63-11e6-b851-005056a32605:1-1662 |
+------------------+----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 95c3671b-1c63-11e6-b851-005056a32605 |              1 |         1655 |
| 95c3671b-1c63-11e6-b851-005056a32605 |           1656 |         1658 |
| 95c3671b-1c63-11e6-b851-005056a32605 |           1659 |         1660 |
+--------------------------------------+----------------+--------------+
3 rows in set (0.00 sec)

# mysqldump --all-databases --single-transaction --triggers --routines --events --flush-privileges > db_backup.sql

On Slave:

mysql> reset master;

# mysql < db_backup.sql

mysql> show master status;
+------------------+----------+--------------+------------------+---------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                           |
+------------------+----------+--------------+------------------+---------------------------------------------+
| mysql-bin.000001 |      154 |              |                  | 95c3671b-1c63-11e6-b851-005056a32605:1-1662 |
+------------------+----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 95c3671b-1c63-11e6-b851-005056a32605 |              1 |         1655 |
| 95c3671b-1c63-11e6-b851-005056a32605 |           1656 |         1658 |
| 95c3671b-1c63-11e6-b851-005056a32605 |           1659 |         1660 |
+--------------------------------------+----------------+--------------+
3 rows in set (0.01 sec)

# service mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]

mysql> show master status;
+------------------+----------+--------------+------------------+---------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                           |
+------------------+----------+--------------+------------------+---------------------------------------------+
| mysql-bin.000002 |      154 |              |                  | 95c3671b-1c63-11e6-b851-005056a32605:1-1660 |
+------------------+----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)

Notice how transaction 1662 is no longer in the executed set.  This causes issues such as duplicate key errors when the slave is restarted.
[9 Aug 2016 13:32] MySQL Verification Team
Hi,

Please, just do not dump database `mysql`, but always create it for each new server.
[15 Aug 2016 14:15] Ryan Masse
This is not an acceptable answer.  When performing a logical backup using mysqldump on all databases there is no way to exclude the mysql database without creating a custom script to accomplish the task.

The proper solution is to either modify the behaviour of mysqldump or mysqld itself.

This is also occurring on an Ubuntu 16.04 server.
[16 Aug 2016 2:29] Bryan Aldridge
I agree this is not acceptable.  mysqldump has always been the tried and true way to create a new slave from a master.  Not dumping the mysql database, and therefore not being able to use the "--all-databases" option when creating a dump for a new slave is an odd restriction, isn't mentioned in documentation, and creates a trap that many users would seemingly fall into. The documentation also is a bit lacking on the topic of using mysqldump to provision a new GTID-based slave.

In addition, when creating a mysqldump from a running master, we are even given the following warning if we don't pass in "--all-databases":

"Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events."

So it seems like your comment to not dump the mysql database is at odds with this.
[3 Sep 2016 10:27] MySQL Verification Team
Thank you for the feedback.
This is most likely duplicate of Bug #82848

Thanks,
Umesh
[3 Sep 2016 13:42] Ryan Masse
I'm glad that this was acknowledged and a patch was subsequently provided under #82848.

I recognize that this is open source and community driven.  My only rub is that our ticket was create June 2nd and never "Verified". 82848 was created September 2nd and verified as a bug the same day.  Doesn't that seam odd?
[12 Sep 2016 11:14] Daniƫl van Eeden
> I recognize that this is open source and community driven.  My only rub is that our ticket was create 
> June 2nd and never "Verified". 82848 was created September 2nd and verified as a bug the same day.
> Doesn't that seam odd?

Yes this one should have been verified and Bug #82848 (my bug) should have been marked as a duplicate.
I should have searched a bit more before creating it.

I guess the reason to mark the older bug as duplicate is that my bug contains a patch and more info.

I encountered this bug on 5.7.14 and you reported it on 5.7.12. The fix is simple, easy to create and
is quite safe. So if your bug was verified in time it could have been fixed in 5.7.13 or 5.7.14 and
I might have never noticed it.

My GitHub pull request was put into Bug #82854, so we now have 3 bug numbers (2 verified, 1 duplicate)
for the same thing. 

I did also open a service request on https://support.oracle.com (requires support contract). But I 
don't think that changed anything for this particular bugreport as it was verified by Umesh (who works
on community bugs verification) and not by the engineer assigned to the SR.

Thanks for creating the initial bug report for this issue :)