MySQL Bugs: #9949: Corruption in master binlog caused by full disk

Bug #9949	Corruption in master binlog caused by full disk
Submitted:	16 Apr 2005 5:12	Modified:	8 Jun 2005 14:38
Reporter:	Robert Bankay	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	4.0.23	OS:	Linux (Linux RH-E3)
Assigned to:		CPU Architecture:	Any

Description:
The master server experienced an out of disk condition that prevented writing of binlog-096. Then space was made available about 35 minutes later and the master continued processing.
When the slave caught up to this position in the master  binlog it received a corrupted sql.  The slave.err is:

050413 15:57:11  mysqld started
050413 15:57:22  InnoDB: Started
/mydisks/a/apps/mysql/bin/mysqld: ready for connections.
Version: '4.0.13-max-log'  socket: '/tmp/mysql.sock'  port: 3306
050413 15:57:22  Slave I/O thread: connected to master 'slave13@jocelyn:3306',  replication started in log 'jocelyn-bin.096' at position 31481884
050413 15:57:22  Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
050413 15:57:22  Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
050413 15:57:22  Slave I/O thread exiting, read up to log 'jocelyn-bin.096', position 31481884
050413 16:06:04  Slave SQL thread exiting, replication stopped in log  'jocelyn-bin.092' at position 1017768414
050413 19:38:03  Slave I/O thread: connected to master 'slave13@jocelyn:3306',  replication started in log 'jocelyn-bin.096' at position 31481884
050413 19:38:03  Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
050413 19:38:03  Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
050413 19:38:03  Slave I/O thread exiting, read up to log 'jocelyn-bin.096', position 31481884
ERROR: 1064  You have an error in your SQL syntax.  Check the manual that corresponds to your MySQL server version for the right syn' at line 1near 'ÏZB
', error_code=0  Slave: error 'unexpected success or fatal error' on query 'delete fÏZB
050415 14:37:41  Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'jocelyn-bin.096' position 31481807
ERROR: 1064  You have an error in your SQL syntax.  Check the manual that corresponds to your MySQL server version for the right syn' at line 1near 'ÏZB
', error_code=0  Slave: error 'unexpected success or fatal error' on query 'delete fÏZB
050415 15:30:42  Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'jocelyn-bin.096' position 31481807
050415 15:30:42  Slave I/O thread: connected to master 'slave13@jocelyn:3306',  replication started in log 'jocelyn-bin.096' at position 31481884
050415 15:30:42  Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master (server_errno=1236)
050415 15:30:42  Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
050415 15:30:42  Slave I/O thread exiting, read up to log 'jocelyn-bin.096', position 31481884
************************************************

Here is the master.err listing when the no disk space occured:

050331 15:04:05  InnoDB: Started
/mydisks/a/apps/mysql/bin/mysqld: ready for connections.
Version: '4.0.23-max-log'  socket: '/tmp/mysql.sock'  port: 3306  Official MySQL-max binary
050411 12:00:08 /mydisks/a/apps/mysql/bin/mysqld: Disk is full writing './SETI_BOINC/forum_logging.MYD' (Errcode: 28). Waiting for s
omeone to free space... Retry in 60 secs
050411 12:05:51 /mydisks/a/apps/mysql/bin/mysqld: Disk is full writing './SETI_BOINC/post.MYD' (Errcode: 28). Waiting for someone to
 free space... Retry in 60 secs
050411 12:10:08 /mydisks/a/apps/mysql/bin/mysqld: Disk is full writing './SETI_BOINC/forum_logging.MYD' (Errcode: 28). Waiting for s
omeone to free space... Retry in 60 secs
050411 12:15:51 /mydisks/a/apps/mysql/bin/mysqld: Disk is full writing './SETI_BOINC/post.MYD' (Errcode: 28). Waiting for someone to
 free space... Retry in 60 secs
050411 12:20:08 /mydisks/a/apps/mysql/bin/mysqld: Disk is full writing './SETI_BOINC/forum_logging.MYD' (Errcode: 28). Waiting for s
omeone to free space... Retry in 60 secs

***********************************************************

Here is a listing of the binlog-096 at offset  31481807:

[mysql@jocelyn ~]$ bin/mysqlbinlog --position=31481807 data/jocelyn-bin.096 | less
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
# at 31481807
#050411 11:49:32 server id 13  log_pos 31481807         Query   thread_id=253060        exec_time=0     error_code=0
use SETI_BOINC;
SET TIMESTAMP=1113245372;
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 4410953, event_type: 84
Could not read entry at offset 31481884:Error in log format or read error
delete f^OÃB^B^M^@^@^@ÃD^@^@ÃÂ±E^@^@Ã½^E^@^@^@^@^@
;

How to repeat:
Force an out of space condition on the binlog partition and keep on processing transactions.  
After 35 minutes allocate disk space and the binlog entry should of the last binlog should show data corruption.

Suggested fix:
Do not corrupt the binlog file because it messes up replication.

I have not been able to reproduce this with my current 4.1.11 master/slave test setup,
will try 4.0.23 later ...

I am experiencing this problem on 4.0.18. How can I recover a corrupt binlog? I've recovered a corrupt table due to full disk (repair table), but the binlog still seems corrupt.

This has been taken care of in MySQL 4.1.9, see the 4.1.9 Changelog:

  The binary log file and binary log index file now behave like MyISAM
  when there is a "disk full" or "quota exceeded" error. See also Section 
  A.4.3, “How MySQL Handles a Full Disk”.

That's great, I'm very glad to hear that.

Unfortunately, I can't upgrade to 4.1.9 at the moment. 

Even if I could, that wouldn't solve the current problem with the corrupt binlog, would it? Although I suppose I'd have to re setup replication in that case, which would mean new binlogs, etc. 

Is there any way to repair the binlog and get replication synced up again? Or will I have to redo replication?

Thanks,
Steve

I managed to expirience this problem with mysql-standard-4.1.12-unknown-linux-gnu-x86_64-icc-glibc23 as a master and mysql-max-4.0.15-pc-linux-i686 as a slave. Am currently figuring out how to skip over the broken binlog and continue replication without a complete slave dump and reloading all data from master.

Hi,

I have the exact issue with our system having a disk full issue. The logs and errors shown in this posting exactly match the situation i have here. 

Our Server version is : 3.23.58.

Please provide a solution how I can recover the corrupted binlog file and start the replication from the position it stopped before.

Thank you in advance....