Bug #73032 setting gtid_purged may break auto_position and thus slaves
Submitted: 18 Jun 2014 9:54 Modified: 3 Nov 2014 9:34
Reporter: Santosh Praneeth Banda Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.19 OS:Any
Assigned to: CPU Architecture:Any

[18 Jun 2014 9:54] Santosh Praneeth Banda
Description:
Setting gtid_purged on a newly built server with GTID rotates the binary log. The previous_gtid_log_event is NULL in the first binary log and not NULL in the second binary log.

set global gtid_purged = 'UUID:1-1000';

new server binary logs

binary_log.1  /* previous_gtid_log_event is NULL */
binary_log.2  /* previous_gtid_log_event is UUID:1-1000 */

Now when a slave with gtid_exeucted UUID:1-100 is pointed to the new server, the new server starts sending events from binary_log.1 which is wrong. Instead the slave io_thread should have failed with the error "The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires". 

How to repeat:
see description

Suggested fix:
set global gtid_purged needs to purge binary_log.1
[18 Jun 2014 13:50] MySQL Verification Team
Hello Santosh,

Thank you for the report.
Verified as described.

Thanks,
Umesh
[18 Jun 2014 13:51] MySQL Verification Team
// How to repeat

// Setting Up Replication Using GTIDs - http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-howto.html

Master> bin/mysqld_safe --defaults-file=./73032M.cnf --gtid_mode=ON --log-bin --log-slave-updates --enforce-gtid-consistency --user=root &
 Slave> bin/mysqld_safe --defaults-file=./73032S.cnf --gtid_mode=ON --log-bin --log-slave-updates --enforce-gtid-consistency --skip_slave_start --user=root &

// No transaction should be executed on master in order to set gtid_purged = 'UUID:1-1000', else most likely to see ERROR 1840 (HY000)

// Master 

mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000003 |      151 |              |                  |                   |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> show variables like 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value                                |
+---------------+--------------------------------------+
| server_uuid   | 7edc7076-f82f-11e3-953c-00163e44510c |
+---------------+--------------------------------------+
1 row in set (0.01 sec)

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000001 |     65294 |
| master-bin.000002 |   1071072 |
| master-bin.000003 |       151 |
+-------------------+-----------+
3 rows in set (0.00 sec)

mysql> set global gtid_purged = '7edc7076-f82f-11e3-953c-00163e44510c:1-1000';
Query OK, 0 rows affected (0.02 sec)

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000001 |     65294 |
| master-bin.000002 |   1071072 |
| master-bin.000003 |       199 |
| master-bin.000004 |       191 | <-------------- new binlog created
+-------------------+-----------+
4 rows in set (0.00 sec)

// Slave

mysql> show variables like 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value                                |
+---------------+--------------------------------------+
| server_uuid   | 9e19bc8f-f82f-11e3-953d-00163e44510c |
+---------------+--------------------------------------+
1 row in set (0.00 sec)

mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| slave-bin.000001 |       151 |
| slave-bin.000002 |     69327 |
| slave-bin.000003 |   1237646 |
| slave-bin.000004 |       151 |
+------------------+-----------+
4 rows in set (0.00 sec)

mysql> set global gtid_purged = '9e19bc8f-f82f-11e3-953d-00163e44510c:1-100';
Query OK, 0 rows affected (0.01 sec)

mysql> change master to master_host='127.0.0.1', master_port=3306, master_user='repl',MASTER_PASSWORD='slavepass', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.07 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

// from error log
2014-06-20 09:40:48 28051 [Note] Slave I/O thread: connected to master 'repl@127.0.0.1:3306',replication started in log 'FIRST' at position 4
[18 Jun 2014 13:51] MySQL Verification Team
// Master - extract master's binary logs to see NULL vs real UUID

[root@cluster-repo mysql-advanced-5.6.19]# bin/mysqlbinlog /tmp/master/master-bin.000003
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#140620  9:31:02 server id 1  end_log_pos 120 CRC32 0xfa97961d  Start: binlog v 4, server v 5.6.19-enterprise-commercial-advanced-log created 140620  9:31:02 at startup
ROLLBACK/*!*/;
BINLOG '
/rGjUw8BAAAAdAAAAHgAAAAAAAQANS42LjE5LWVudGVycHJpc2UtY29tbWVyY2lhbC1hZHZhbmNl
ZC1sb2cAAAAAAAAAAAD+saNTEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAR2W
l/o=
'/*!*/;
# at 120
#140620  9:31:02 server id 1  end_log_pos 151 CRC32 0x384e6fe1  Previous-GTIDs
# [empty] <------------------------------------------------------------------------------------------------NULL
# at 151
#140620  9:37:20 server id 1  end_log_pos 199 CRC32 0xc3269bf2  Rotate to master-bin.000004  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@cluster-repo mysql-advanced-5.6.19]# bin/mysqlbinlog /tmp/master/master-bin.000004
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#140620  9:37:20 server id 1  end_log_pos 120 CRC32 0x2422f67d  Start: binlog v 4, server v 5.6.19-enterprise-commercial-advanced-log created 140620  9:37:20
# Warning: this binlog is either in use or was not closed properly.
BINLOG '
eLOjUw8BAAAAdAAAAHgAAAABAAQANS42LjE5LWVudGVycHJpc2UtY29tbWVyY2lhbC1hZHZhbmNl
ZC1sb2cAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAX32
IiQ=
'/*!*/;
# at 120
#140620  9:37:20 server id 1  end_log_pos 191 CRC32 0xe7ddbf75  Previous-GTIDs
# 7edc7076-f82f-11e3-953c-00163e44510c:1-1000  <------------------------------------------------------------NOT NULL
# at 191
#140620  9:39:16 server id 1  end_log_pos 239 CRC32 0x5e2569b4  GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '7edc7076-f82f-11e3-953c-00163e44510c:1001'/*!*/;
# at 239
#140620  9:39:16 server id 1  end_log_pos 407 CRC32 0x1cccd7f4  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1403237356/*!*/;
SET @@session.pseudo_thread_id=1/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
CREATE USER 'repl'@'localhost' IDENTIFIED BY PASSWORD '*809534247D21AC735802078139D8A854F45C31F3'
/*!*/;
# at 407
#140620  9:39:16 server id 1  end_log_pos 455 CRC32 0x9c79d7b6  GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '7edc7076-f82f-11e3-953c-00163e44510c:1002'/*!*/;
# at 455
#140620  9:39:16 server id 1  end_log_pos 594 CRC32 0xd2d25fef  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1403237356/*!*/;
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'localhost'
/*!*/;
# at 594
#140620  9:39:16 server id 1  end_log_pos 642 CRC32 0x2e128167  GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '7edc7076-f82f-11e3-953c-00163e44510c:1003'/*!*/;
# at 642
#140620  9:39:16 server id 1  end_log_pos 810 CRC32 0xb465858b  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1403237356/*!*/;
CREATE USER 'repl'@'127.0.0.1' IDENTIFIED BY PASSWORD '*809534247D21AC735802078139D8A854F45C31F3'
/*!*/;
# at 810
#140620  9:39:16 server id 1  end_log_pos 858 CRC32 0x90dee9a8  GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '7edc7076-f82f-11e3-953c-00163e44510c:1004'/*!*/;
# at 858
#140620  9:39:16 server id 1  end_log_pos 997 CRC32 0x90cb2489  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1403237356/*!*/;
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'127.0.0.1'
/*!*/;
# at 997
#140620  9:39:17 server id 1  end_log_pos 1045 CRC32 0x93ee9c11         GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '7edc7076-f82f-11e3-953c-00163e44510c:1005'/*!*/;
# at 1045
#140620  9:39:17 server id 1  end_log_pos 1124 CRC32 0x06a8e867         Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1403237357/*!*/;
FLUSH PRIVILEGES
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@cluster-repo mysql-advanced-5.6.19]#
[18 Jun 2014 14:00] MySQL Verification Team
// Second case - On slave - when tried using set global gtid_purged = 'MASTER_UUID:1-100';

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000001 |     65294 |
| master-bin.000002 |   1071072 |
| master-bin.000003 |       151 |
+-------------------+-----------+
3 rows in set (0.00 sec)

mysql> show variables like 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value                                |
+---------------+--------------------------------------+
| server_uuid   | 84f959cc-f834-11e3-955d-00163e44510c |
+---------------+--------------------------------------+
1 row in set (0.00 sec)

mysql> set global gtid_purged = '84f959cc-f834-11e3-955d-00163e44510c:1-1000';
Query OK, 0 rows affected (0.01 sec)

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000001 |     65294 |
| master-bin.000002 |   1071072 |
| master-bin.000003 |       199 |
| master-bin.000004 |       191 |
+-------------------+-----------+
4 rows in set (0.00 sec)

// Slave

mysql> set global gtid_purged = '84f959cc-f834-11e3-955d-00163e44510c:1-100';
Query OK, 0 rows affected (0.02 sec)

mysql> change master to master_host='127.0.0.1', master_port=3306, master_user='repl',MASTER_PASSWORD='slavepass', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.04 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000004
          Read_Master_Log_Pos: 1124
               Relay_Log_File: slave-relay-bin.000004
                Relay_Log_Pos: 1296
        Relay_Master_Log_File: master-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1124
              Relay_Log_Space: 1548
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 84f959cc-f834-11e3-955d-00163e44510c
             Master_Info_File: /tmp/slave/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 84f959cc-f834-11e3-955d-00163e44510c:1001-1005
            Executed_Gtid_Set: 84f959cc-f834-11e3-955d-00163e44510c:1-100:1001-1005
                Auto_Position: 1
1 row in set (0.00 sec)
[3 Nov 2014 9:34] David Moss
Thank you for your feedback. The following was added to the 5.6.22 and 5.7.6 release notes with commit 4560:
When using GTIDs for replication and with MASTER_AUTO_POSITION enabled, if a slave requested GTIDs which had been already been purged by the master, the master was sending all available GTIDs. This happened because the master reads all available binary logs and searches for a binary log which contains a GTID that is not contained in the union of gtid_executed and gtid_retrieved. If such a GTID is found, the master starts sending the information starting from that location. In a situation where the union of the slave's gtid_executed and gtid_retreived set did not contain the master's gtid_purged set, the slave would expect GTIDs which had already been purged by the master. This fix ensures that in such a situation, the slave's I/O thread is aborted with an error "Master has purged binary logs containing GTIDs that the slave requires.".
[10 Dec 2014 14:11] Laurynas Biveinis
$ bzr log -r 6211
------------------------------------------------------------
revno: 6211
committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com>
branch nick: mysql-5.6
timestamp: Mon 2014-10-20 15:11:32 +0530
message:
  Bug#19012085 SETTING GTID_PURGED MAY BREAK AUTO_POSITION
  AND THUS SLAVES
        
  Problem:
  When slave asks for GTIDs which were purged by Master,
  Master is silently ignoring it and sending available GTIDs
        
  Analysis:
  In auto_postion=1 protocol, when I/O thread is connecting,
  dump thread receives slave's current available gtid sets
  (gtid_executed + gtid_retrieved) information. Master reads
  all available binary logs and search for a binary log which
  contains a gtid which is not part of this list and starts
  sending the information starting from that location.
  But it is not doing any preliminary validation on the gtid sets.
        
  In the current bug scenario, Master has already purged few gtids
  which Slave does not have them. Master is silently ignoring
  this scenario and sending the gtids what ever is available on it.
  This will make Slave to have some gaps in gtid sets which in turn
  will cause many problems like inconsistency. 
        
  Setting GTID_PURGED (when GTID_EXECUTED set is empty i.e., when
  previous_gtids are also empty) will make binlog rotate. That
  leaves first binary log with empty previous_gtids and second
  binary log's previous_gtids with the value of gtid_purged.
  In find_first_log_not_in_gtid_set() while we search for a binary
  log whose previous_gtid_set is subset of slave_gtid_executed,
  in this particular case, server will always find the first binary
  log with empty previous_gtids which is subset of any given
  slave_gtid_executed. Thus Master thinks that it found the first
  binary log which is actually not correct and unable to catch
  this error situation. Hence adding an extra if condition
  to check the situation. Slave should know about Master's purged GTIDs.
  If Slave's GTID executed + retrieved set does not contain Master's
  complete purged GTID list, that means Slave is requesting(expecting)
  GTIDs which were purged by Master. We should let Slave know about the
  situation. i.e., throw error if slave's GTID executed set is not
  a superset of Master's purged GTID set.
  The other case, where user deleted binary logs manually
  (with out using 'PURGE BINARY LOGS' command) but gtid_purged
  is not set by the user, the following if condition cannot catch
  it. But that is not problem because in find_first_log_not_in_gtid_set()
  while checking for subset previous_gtids binary log, the logic
  will not find one and an error ER_MASTER_HAS_PURGED_REQUIRED_GTIDS
  is thrown from there.
  Now, after the patch,
  Slave's I/O thread, when connecting to Master in this situation,
  will be aborted with an error "Master has purged binary logs
  containing GTIDs that the slave requires."