MySQL Bugs: #65151: STOP SLAVE/START SLAVE failure with gtid-mode=ON

Bug #65151	STOP SLAVE/START SLAVE failure with gtid-mode=ON
Submitted:	30 Apr 2012 4:58	Modified:	10 Jun 2012 15:23
Reporter:	Joffrey MICHAIE	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.6.5m8	OS:	Linux (RHEL 6.2 64bits)
Assigned to:		CPU Architecture:	Any
Tags:	error 1236, start slave, stop slave

Description:
Using 5.6 Replication with GTID, stop slave and start slave fails with random errors. Reset slave, and start slave is required to resume replication (or issue again change master)

Error message 1 :
Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mysql-bin.000004' at 330, the last event read from './mysql-bin.000004' at 330, the last byte read from './mysql-bin.000004' at 349.'

Error message 2 :
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

How to repeat:
1) Start 2xMySQL 5.6 with following configuration :

---- Server 1 ----
[client]
port		= 3306
socket		= /var/lib/mysql/mysql.sock
[mysqld]
port		= 3306
socket		= /var/lib/mysql/mysql.sock
skip-external-locking
max_allowed_packet = 16M
max_connections = 100
skip-name-resolve = 1
server-id	= 1
gtid_mode = ON
log_slave_updates = 1
disable-gtid-unsafe-statements = 1
log-bin=mysql-bin
max_binlog_size = 1G
binlog_format=ROW
slave_compressed_protocol = 1
innodb_buffer_pool_size = 64M
innodb_log_file_size = 256M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table = 1
innodb_file_format = Barracuda
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 8M
sort_buffer_size = 8M
[mysqlhotcopy]
interactive-timeout

---- Server 2 ----
Idem as Server 1 but :
server-id = 2

2) Create replication user

mysql server1 > GRANT REPLICATION SLAVE ON *.* TO 'replic'@'%' IDENTIFIED BY 'replic';          

3) Start replication 
mysql server2 > CHANGE MASTER TO MASTER_HOST='IP.OF.SERVER1',MASTER_PORT=3306,MASTER_USER='replic',MASTER_PASSWORD='replic',MASTER_AUTO_POSITION=1;            
mysql server2 > START SLAVE;
mysql server2 > SHOW SLAVE STATUS\G
Verify it is working :
(...)
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
(...)
        Seconds_Behind_Master: 0                   

4) Stop slave; Start slave;
mysql server2> STOP SLAVE;
mysql server2> START SLAVE;
mysql server2> SHOW SLAVE STATUS\G

Verify following error :
(...)
            Slave_IO_Running: No
            Slave_SQL_Running: Yes
(...)
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mysql-bin.000002' at 133750, the last event read from './mysql-bin.000002' at 133750, the last byte read from './mysql-bin.000002' at 133769.'

5) Re-enable replication
mysql server2 > STOP SLAVE;

mysql server2 > RESET SLAVE;

mysql server2 > START SLAVE;

mysql server2 > show slave status\G

Verify it is back up and running

Suggested fix:
With GTID, is it fantastic to be able to reset replication by running again a "CHANGE MASTER" or "RESET SLAVE", but simple operations such as "stop slave/start slave" should not fail with an error

PS : could not reproduce error message 2, but saw it many times while testing new replication features, always solved by issuing a "CHANGE MASTER" or "RESET SLAVE"

Thank you for the report.

I can not repeat described behavior. Do you have some activity on master in time of failing event? Could you please send us problem binary log file?

Hi!

Whith mysql 5.5.22 I have some problem. Replication master-master work with 2 5.5.22 servers. Then i run this commands on one is:

mysql> slave stop;
mysql> create database phpmyadmin;
sh> mysql phpmyadmin < phpmyadmin.sql (in another console)
mysql> start slave;
mysql> SHOW SLAVE STATUS \G;
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.0.2
                  Master_User: rep_user2
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000008
          Read_Master_Log_Pos: 19306
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000008
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB: mysql,test,information_schema,perfomance_schema
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 106
              Relay_Log_Space: 19609
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mysql-bin.000008' at 106, the last event read from '/mnt/sdb/var/lib/mysql/mysql-bin.000008' at 46592, the last byte read from '/mnt/sdb/var/lib/mysql/mysql-bin.000008' at 19325.'
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0,00 sec)

ERROR:
No query specified

---------------------------------------------------------------------

I change max_allowed_packet to 1G on both servers, restart both and got some error.

Oops! I am sorry, it's my mistake. I dump all files with --master-data options and want restore it after 2 week..

Andrey,

thank you for update. So i assume we can ignore your initial comment. Please reject if not.

I am still waiting, from original reporter, feedback I requested.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Your solution of "reset slave" is great... I was trouble with error 1032 after lost the sincronization and  set new parameters, after start slave I was the error 1032, try many solutions and not worked, but with your solution this was sufficient

Thank you so much