Bug #8215 Replication fails with "log event entry exceeded max_allowed_packet"
Submitted: 31 Jan 2005 15:34 Modified: 21 Jan 2014 17:45
Reporter: Kenneth Hedlund Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.1.7 OS:Any (White Box EL release 3.0)
Assigned to: CPU Architecture:Any

[31 Jan 2005 15:34] Kenneth Hedlund
Description:
Replication randomly fails after running for some days with the following log entrys:

050131 11:53:44 [ERROR] Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
050131 11:53:44 [ERROR] Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
050131 11:53:44 [ERROR] Slave I/O thread exiting, read up to log 'master-bin.000003', position 33116423

Only way to get replication up and running again is to restart the master with a new bin-log file.
When  I examine entrys in 'master-bin.000003' with mysqlbinlog (comparing the "# at <pos>" with the following "log_pos" entry) they match until this error occurs.

This is the cmd I use (on the master):

mysqlbinlog fenja-bin.000003 |egrep '# at | log_pos' |awk '{if (NF == 3) {pos=$3; getline}; if ($7 == pos) print pos,$1,$2,$7}' | tail -5

Which gives the following:

  33115120 #050131 11:52:47 33115120
  33115165 #050131 11:52:47 33115165
  33115426 #050131 11:52:47 33115426
  33115471 #050131 11:53:12 33115471
  33115741 #050131 11:53:12 33115741

After this point the "#at" entry and the "log_pos" entry don't match anymore ...
Nothing else strange is found in this log or in the .err logfile and the master keeps on running.

It looks like mysql looses track of its position in the bin-log file (my wild guess!).

Any ideas?

Regards
Ken

How to repeat:
Setup replication and let it run for some days. It doesn't look like the size of the bin-log matters.
This time it was 33MB and the previous time it was 845 MB.
[31 Jan 2005 21:00] Guilhem Bichot
Hi Ken,
We would be infinitely interested in seeing such a weird binlog. You may have found a specimen of an old claimed bug which we never were able to reproduce.
I'm asking for the file because I would need to dig into it myself, to see if the difference between "#at" and "log_pos" has a valid reason.
Could you please upload a very compressed failing master's binlog in the "files" section of this report (you will be able to say this file must be visible *only* to developers from MySQL AB)?
Thank you very much.
Guilhem
[2 Feb 2005 21:14] Guilhem Bichot
Doing tests with Kenneth.
[3 Mar 2005 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[3 Mar 2005 7:04] Kenneth Hedlund
After updating both master and slave with the latest patches/versions of kernel, glibc and a lot of other updates the replication has been running for two weeks now without a failure.
I'm close to saying that this solved the problem but would like to wait another week or two before closing this case.
/Ken
[18 Mar 2005 12:41] Kenneth Hedlund
Now the replication has been running for four weeks without a glitch so I'm closing this case. Thanks Guilhem for all the help!
[18 Mar 2005 13:32] Guilhem Bichot
Hello Kenneth,
I am glad it was apparently an OS/glibc/... issue. It is the second time the same symptom are explained by this cause (BUG#1818 is the other one, as you know).
Long life to your MySQL servers!
Guilhem
[7 May 2005 9:59] Guilhem Bichot
Happened again. Kenneth says "I did some comparisions between the master and the slave's bin-log and as you suspected there where entrys in the slave's bin-log that where missing on the master.". I still don't know what's going on there.
[2 Aug 2005 13:49] Kenneth Hedlund
We're still having random replication failures and I really don't know how or what to search for in order to get to the bottom of this problem.
[12 Sep 2005 7:36] Hartmut Holzgraefe
Hi Kenneth, could you provide us with one of the broken binlog files for further analysis?
I seem to have a similar case and i'd like to check whether the corruption within the file
looks similar.

You can upload files to ftp://ftp.mysql.com/pub/mysql/upload, these will be visible to
MySQL staff only.
[8 Feb 2006 10:11] Martijn van den Burg
I get the exact same error on 4.1.10-log, Solaris, after replication has been running for 5 days. On another box, replication ran for weeks without this issue sofar.

When checking out the binlog, using mysqlbinlog, it stops short with the following message:

ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1635131433, event_type: 109
Could not read entry at offset 1384:Error in log format or read error

max_allowed_packet is 64MB, both at the master and at the slave, and examining the binlog with the UNIX 'strings' commando does not reveal huge inserts, so I doubt that this setting is the culprit.

I have the binlog available for your examination.
[8 Feb 2006 10:39] Martijn van den Burg
Checking the slave's relay log, I found that the date/timestamp suddenly changed:

#060208 10:00:40 server id 10  log_pos 1445     Query   thread_id=336905        exec_time=0     error_code=0
SET TIMESTAMP=1139389240;
update sanity_checks set test_status_check_datetime=now() where id=286;
# at 5099786
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001155  pos: 1384
# at 5099826
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001155  pos: 1384
# at 5099866
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001155  pos: 1384
# at 5099906
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001155  pos: 1384
# at 5099946
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001155  pos: 1384
# at 5099986
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001155  pos: 1384
# at 5100026
(logfile ends here)
[22 Feb 2006 20:52] Andreas Frøsting
Hi,

I'm having the exact same problem..
Master is 4.1.11-standard-log running on Gentoo...

The first slave, only replicating some tables from the master, is running freebsd and MySQL 4.1.15 without any problems...

The second is replicating an entire database, but not all data from the master.
This slave is running gentoo and MySQL 4.1.12-log
No problems here either..

The third one, which is causing all the troubles is running freebsd and MySQL 4.1.7-log and replicating all data from the master..

Suddenly replication stops with:

060222 21:28:45 [ERROR] Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
060222 21:28:45 [ERROR] Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
060222 21:28:45 [ERROR] Slave I/O thread exiting, read up to log 'versatile-bin.000118', position 45657284

After reading this thread, I tried upgrading to MySQL 4.1.18-log on this server, cleared all replication logs and tried to continue from where it stopped.

No luck - same error... Actually, no matter what log position i try, I get the above error... Why?

I removed relay-log.info, master.info, relay binlogs... started from scratch, upgraded mysql - my other slaves can read the binlogs - this one simply _can't_...
[23 Feb 2006 7:37] Valeriy Kravchuk
Andreas,

Please, send the max_allowed_packet values from master and all the slaves.
[23 Feb 2006 7:48] Andreas Frøsting
Hi,

max_allow_packet values...
Master: 104857600 (increased from 1048576 - both fails)
First slave: 2096128
Second slave: 1047552
Third slave (the faulty one): 104856576 (increased from something smaller without any luck)..

My setup has been running for months without any problems at all..

I solved the problem this morning, though...

First I did a flush tables with read lock on the master, recording logs and positions and copied tables etc. to the third slave, then restarted replication.

This failed with the same error...

Then I did a reset master; flush tables with read lock; and copied the tables to the third slave again and restarted replication (on all slaves) - it's still running..

Funky stuff :-)
[23 Feb 2006 13:38] Martijn van den Burg
Same thing happened again today, again after a replication-uptime of approximately five days (replication started from scratch).

This time, event_type is 118, not 109.

Note that again the timestamp of the last entry in the relay-binlog differs dramatically from the last-but-one timestamp:

#060223 14:19:46 server id 10  log_pos 33554399         Query   thread_id=553090        exec_time=0     error_code=0
use sdev_p;
SET TIMESTAMP=1140700786;
update workspaces set modified_by='wrademak', modified_datetime='2006-02-23 14:19:45', status='active' [--snip--] where id=48856;
# at 9454300
#060223 14:19:46 server id 10  log_pos 33554764         Rotate to binlog.001282  pos: 4
# at 9454340
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001282  pos: 4
# at 9454380
#060223 14:19:46 server id 10  log_pos 626      Query   thread_id=553077        exec_time=0     error_code=0
use db_statistics;
SET TIMESTAMP=1140700786;
INSERT INTO db_statistics.1k5 SET x = 869;
# at 9454465
#700101  1:00:00 server id 10  log_pos 0        Rotate to binlog.001282  pos: 89

Broken binlog is available for your analysis, if you want it.
[23 Feb 2006 22:57] Andreas Frøsting
Great...

Just great... Now all my slaves does this crap...
[24 Feb 2006 9:05] Valeriy Kravchuk
So, why not to set max_allowed_packet to the same size on master and all slaves? And set read_buffer_size <= max_allowed_packet, just to be sure... See Bug #15937 for the details.
[24 Mar 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[17 Jan 2014 10:32] wei liu
Our replication has the same issue. MySQL 5.5.16 (for Solaris 10)
The both master and slave have the same configuration as below:
max_allowed_packet = 32M
read_buffer_size = 2M
[21 Jan 2014 17:45] Sveta Smirnova
Thank you for the feedback.

You experience bug #60926 fixed in version 5.5.26. Please upgrade.