Bug #32228 A disk full makes binary log corrupt.
Submitted: 9 Nov 2007 14:18 Modified: 28 Sep 2009 15:50
Reporter: Arnoud Witt
Status: Verified
Category:Server: Replication Severity:S2 (Serious)
Version:4.1.12-standard-log, 4.1.23, 5.0.50, 5.1.22 OS:Any
Assigned to: Sven Sandberg Target Version:
Tags: bfsm_2007_12_06, bfsm_2007_11_15
Triage: Triaged: D2 (Serious)

[9 Nov 2007 14:18] Arnoud Witt
Description:
A disk full occurred on the master, then mysqld on the slave reported in the .err log:

070914 12:50:06  mysqld started
/usr/sbin/mysqld: ready for connections.
Version: '4.1.12-standard-log'  socket: '/tmp/mysql.sock4'  port: 3306  MySQL Community
Edition - Standard (GPL)
070914 12:50:07 [Note] Slave SQL thread initialized, starting replication in log 'FIRST'
at position 0, relay log './qa-dbslave1-relay-bin.0000
01' position: 4
070914 12:50:07 [Note] Slave I/O thread: connected to master
'replicate@10.32.68.13:3306',  replication started in log 'FIRST' at position 4
071006  8:29:33 [ERROR] Error reading packet from server: binlog truncated in the middle
of event (server_errno=1236)
071006  8:29:33 [ERROR] Got fatal error 1236: 'binlog truncated in the middle of event'
from master when reading data from binary log
071006  8:29:33 [ERROR] Slave I/O thread exiting, read up to log
'qa-dbmaster-bin.000003', position 656764771
071008 11:32:12 [Note] Slave I/O thread: connected to master
'replicate@10.32.68.13:3306',  replication started in log 'qa-dbmaster-bin.000003'
 at position 656764771
071008 11:32:12 [ERROR] Error reading packet from server: bogus data in log event
(server_errno=1236)
071008 11:32:12 [ERROR] Got fatal error 1236: 'bogus data in log event' from master when
reading data from binary log
071008 11:32:12 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for
the right syntax to use near '[^GG^E^M' at line 4' on query. Default database: 'fp12_mp'.
Query: 'UPDATE `event` SET
                            `status` = 'processing',
                            `worker_id` = '4',
h[^GG^E^M', Error_code: 1064
071008 11:32:12 [ERROR] Slave I/O thread exiting, read up to log
'qa-dbmaster-bin.000003', position 656765214
071008 11:32:12 [ERROR] Error running query, slave SQL thread aborted. Fix the problem,
and restart the slave SQL thread with "SLAVE START". We
 stopped at log 'qa-dbmaster-bin.000003' position 656764771

On master the error log looked like this:

070914 12:50:06  mysqld started
/usr/sbin/mysqld: ready for connections.
Version: '4.1.12-standard-log'  socket: '/tmp/mysql.sock4'  port: 3306  MySQL Community
Edition - Standard (GPL)
071006 11:54:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waiting for someone to free space... R
etry in 60 secs
071006 12:04:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waiting for someone to free space... R
etry in 60 secs
071006 12:14:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waiting for someone to free space... R
etry in 60 secs
071006 12:24:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waitin071017 17:16:25 [Note] /usr/sbin
/mysqld: Normal shutdown

071017 17:16:27 [Note] /usr/sbin/mysqld: Shutdown complete

071017 17:16:27  mysqld ended

You can see that the server could not write into the error_log anymore as well.

So the binary log got corrupted and the slave could not recover replication after freeing
the disk on the master.

I think that mysqld wrote to the binary log an event and it did not get fully written
into the log.
Then with the next event, the OS reports the disk full and mysqld will retry writing
then. When room is available again, event is just appended to binlog after the incomplete
previous event.

How to repeat:
See description.

Suggested fix:
Let mysqld check for enough space in the binary log before writing into it.
If there is not enough space, postpone writing until there is room again.
[13 Nov 2007 16:52] Sveta Smirnova
Thank you for the report.

I can not repeat described behaviour with current version of MySQL. Please upgrade.
[14 Nov 2007 14:48] David Shrewsbury
Verified against 4.1.12 using InnoDB and transactions. Could not reproduce using MyISAM
only.
[5 May 2008 15:11] Sam Tresler
Confirmed on Debian Etch, Mysql 5.0.32  

Disk full on master, after space is freed up we get:

May  5 08:29:17 colo2 mysqld[2306]: 080505  8:29:17 [Note] Slave I/O thread: connected to
master 'repl@dbmaster:3306',  replication started in log 'mysql-bin.010495' at position
693169273
May  5 08:29:17 colo2 mysqld[2306]: 080505  8:29:17 [ERROR] Error reading packet from
server: binlog truncated in the middle of event ( server_errno=1236)
May  5 08:29:17 colo2 mysqld[2306]: 080505  8:29:17 [ERROR] Got fatal error 1236: 'binlog
truncated in the middle of event' from master when reading data from binary log
May  5 08:29:17 colo2 mysqld[2306]: 080505  8:29:17 [Note] Slave I/O thread exiting, read
up to log 'mysql-bin.010495', position 693169273
[28 May 2008 13:31] Sean Colombo
Confirmed in:
mysql  Ver 14.12 Distrib 5.0.22, for redhat-linux-gnu (i686) using readline 5.0

Is there a known workaround to get replication up and running again?
[28 May 2008 13:38] Sean Colombo
It appears the workaround is to start replication from scratch.
http://lists.mysql.com/mysql/208931