Bug #1818 Replication failed between 4.0.16-4.0.16 on Linux. Reproductible.
Submitted: 12 Nov 2003 10:05 Modified: 21 Jun 2004 11:11
Reporter: Renato Weiner Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.19 OS:Linux (RedHat 7.2 or 7.3)
Assigned to: Guilhem Bichot CPU Architecture:Any

[12 Nov 2003 10:05] Renato Weiner
Description:
Set up 2 servers, both 4.0.16 - 4.0.16 for replication. Replication fails after a while with the message on the log.

031105 12:08:23  Slave I/O thread: connected to master 'xxxxxx',  replication started in log 'ib_logbin1.001' at position 45827132
031105 12:08:23  Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
031105 12:08:23  Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log

How to repeat:
On the master set the parameter on /etc/my.cnf

set-variable    = max_allowed_packet=512M ( not necessary, but I did according to the message... )
set-variable    = max_binlog_size=512M

Execute lots of simple insert/updates/deletes. When the ib_logbin1.001 reaches approx 40MB, you will see the message above.

Suggested fix:
Right now, I'm using the following workaround which seems to work well so far:

set-variable    = max_binlog_size=40M

But with this, I have lots of annoying 40 MB size ib_logbin1.xxx files.
[13 Nov 2003 2:35] Renato Weiner
It looks like my 'solution' of split up the binlog didn't work either. Today I had another failure. It lasted a bit longer, but still replication doesn't work in a good way. Message:

031113  0:35:26  Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
031113  0:35:26  Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
031113  0:35:26  Slave I/O thread exiting, read up to log 'ib_logbin1.004', position 15978017
[25 Nov 2003 10:12] Guilhem Bichot
Hi!

I'm looking forward to know if using our official binaries solved the problem.

Regards,
Guilhem
[26 Nov 2003 12:59] Renato Weiner
I tested with the binaries provided in the website and it didn't work yet.
[24 Apr 2004 15:58] Renato Weiner
I tried replication with version 4.0.18 and 4.1.1-alpha and got the exactly same error. 

I have a version with debug on and I´m thinking what functions should I put in the stack trace ? Maybe something like:

-#d:f,mysql_binlog_send:F:L:t,20

Please advise me, so I can provide more feedback.
[26 Apr 2004 14:00] Guilhem Bichot
Doing some more tests with Mr. Renato Weiner.
[15 May 2004 19:13] Guilhem Bichot
Continuing tests with Mr. Weiner
[7 Jun 2004 11:17] Guilhem Bichot
User is testing on different hardware/OS.
[18 Jun 2004 21:30] Renato Weiner
Hi Guilhem,

As you recommended I completely switched my OS and now everything is working.

In case anybody have this problem:
I was using a RedHat AS 3.0 with the aacraid module. Randomly it truncates the master binary logs, causing the error described it this bug. By using the aic7xxx module, replication is working ok now. 

I recommend to check your OS in case you have this error.

Thanks Guilhem for all the patience and help !!
[21 Jun 2004 11:11] Guilhem Bichot
Glad that your system is now working fine, and that it was not a MySQL problem!
[7 Nov 2005 18:58] Shengyong Hu
Hi, Guihem

Could you tell us what suggestion you gave to Renato? And what did you modified for the testing?

Thanks
[8 Nov 2005 13:48] Guilhem Bichot
Hello Shengyong,
With Renato I think we didn't get complete knowledge of what was wrong: the problem appeared on Redhat 7.3 while there were no problems with Redhat AS 3.0. So it may have been a kernel/glibc issue.
We ruled out a MySQL bug by demonstrating that the binlog was shrinking (which MySQL cannot be responsible for as it never calls ftruncate() on such files): some statements disappeared from the binlog while they were there the second before. For this, Renato set up a script which prints the size of the last binlog every second, to a file. Something like
while true
do
ls your_binlogs | tail -n1 >> list.txt
sleep 1
done
Then when the error occured on slave, he inspected list.txt and found out that at some moment the binlog had its size decreased.
So we supposed that it was an issue with some hard drive OS driver, glibc...
Good luck!
[18 Jun 2006 19:35] Andrei Elkin
The reporter have not provided what kind of query was stuck in his binlog. but i
can guess:
the case resembles bug#9822 and bug#19402 where there are queries size of max_allowed_packet
in binlog.