MySQL Bugs: #29272: Replicate errors with 1236 after harddrive upgrade in 5.0.41

Bug #29272	Replicate errors with 1236 after harddrive upgrade in 5.0.41
Submitted:	21 Jun 2007 14:58	Modified:	22 Jun 2007 0:23
Reporter:	Tom Sensel	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.0.41	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
Hello,

I have 2 servers running in a dual master replication setup. They're quad processor, dual core, 8gb ram, and have a dedicated raid0 array for the /var/lib/mysql directory.

I currently store all of my relay and binary log files on the main OS drive in /var/log/mysql/

Before doing the upgrade I copied all the data off of the drive into a /mysql/ directory on the main OS drive. After the upgrade I simply copied all the data back after turning on directory indexing and running an e2fsck check.

When the server restarted, I expected it to simply pick up where it left off on our primary server and replicate up to date with the bin logs. This was not the case.

I'm now getting these errors with a configuration file that worked just fine before.

070621 9:48:42 [Note] Slave SQL thread initialized, starting replication in log 'log-bin.000004' at position 107242261, relay log '/var/log/mysql/server2-relay-bin.000003' position: 98
070621 9:48:42 [Note] Slave I/O thread: connected to master 'rmtmasta1@10.11.109.140:3306', replication started in log 'log-bin.000004' at position 107242261
070621 9:48:42 [ERROR] Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master ( server_errno=1236)
070621 9:48:42 [ERROR] Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
070621 9:48:42 [Note] Slave I/O thread exiting, read up to log 'log-bin.000004', position 107242261

In order to fix I attempted to increase max_allowed_packet on both the primary and secondary server. Currently I have max allowed packet set to 2gb (2048MB in my.cnf). However, I'm still getting this error.

Here are the file sizes of the bin logs off of the primary server which it is trying to catch up on.

-rw-rw---- 1 mysql mysql 680971880 Jun 21 09:34 log-bin.000004
-rw-rw---- 1 mysql mysql 890377 Jun 21 09:39 log-bin.000005
-rw-rw---- 1 mysql mysql 1197152 Jun 21 09:42 log-bin.000006
-rw-rw---- 1 mysql mysql 3609980 Jun 21 09:48 log-bin.000007
-rw-rw---- 1 mysql mysql 7034267 Jun 21 09:56 log-bin.000008

I would think a 2gb max packet would be able to take care of the entire file. Did I hit a bug? Did I overlook something? I'm not sure what to do. I've attempted to reset the master and do another change master without success though. I'm currently at a loss at what to attempt next.

How to repeat:

Run 5.0.41 and upgrade your harddrives?

I thought I might also note that we're running aprox 2 million tables under a single database aprox. 130gb of innodb data. I'm not sure if that would matter at all in this case.

If I run mysqlbin log at that start position in that file this is the output I get. As per libmysqld_2log__event_8cc if (data_len > max_allowed_packet) { error = "Event too big"; }

However, As I had already said my max_allowed_packet configs are set to 2GB currently. And I don't see why this would matter when im reading the binlog as a file using mysqlbinlog.

[root@server2 sensel]# mysqlbinlog --start-position=107242261 log-bin.000004
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1426088546, event_type: 33
Could not read entry at offset 107242261:Error in log format or read error
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem may be not a result of a bug. In any case, we need a set of exact steps to repeat a problem each and every time to check if this is a bug or not.

Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/

Thank you for your interest in MySQL.

This is what the head technician of my datacenter had to say.

-------------------------------------------------
From our research we have noted that simply copying the data back after an upgrade normally results in these sort of issues.

And the recommended solution was to take an exact snap shot of your master / slave servers before an upgrade with any third party server snap shot utilities and restoring the same after the upgrade.

MySQL support recommends to take a mysql dumb before any sort of upgrade and then restore it afterwards. Even though just copying the files seems to me working in some cases, on some rare instances, it is noted that the database supports a data loss and retains only the table structure. 
---------------------------------------------EOF

Sounds like a bug to me

This sounds reasonable:

"And the recommended solution was to take an exact snap shot of your
master / slave servers before an upgrade with any third party server
snap shot utilities and restoring the same after the upgrade."

but note that you need a snapshot of files on disk in a consistent state. Please, read http://dev.mysql.com/doc/refman/5.0/en/backup.html about recommended backup option.

So, all changes should be stopped while you making a snapshot. I do not believe that if you, for example, shutdown server cleanly (!), then copy all relevant files and restore them on a new drive, you will get some problem. 

If you think that your process of harddrive upgrade was correct, please, describe it in more details, step by step, check that these steps consistently gives you eroror, then add a comment here and send your my.cnf and entire error log.

This can be a result of a bug in server, but there is no way to check it based on your report, yet.