Description:
A disk full occurred on the master, then mysqld on the slave reported in the .err log:
070914 12:50:06 mysqld started
/usr/sbin/mysqld: ready for connections.
Version: '4.1.12-standard-log' socket: '/tmp/mysql.sock4' port: 3306 MySQL Community
Edition - Standard (GPL)
070914 12:50:07 [Note] Slave SQL thread initialized, starting replication in log 'FIRST'
at position 0, relay log './qa-dbslave1-relay-bin.0000
01' position: 4
070914 12:50:07 [Note] Slave I/O thread: connected to master
'replicate@10.32.68.13:3306', replication started in log 'FIRST' at position 4
071006 8:29:33 [ERROR] Error reading packet from server: binlog truncated in the middle
of event (server_errno=1236)
071006 8:29:33 [ERROR] Got fatal error 1236: 'binlog truncated in the middle of event'
from master when reading data from binary log
071006 8:29:33 [ERROR] Slave I/O thread exiting, read up to log
'qa-dbmaster-bin.000003', position 656764771
071008 11:32:12 [Note] Slave I/O thread: connected to master
'replicate@10.32.68.13:3306', replication started in log 'qa-dbmaster-bin.000003'
at position 656764771
071008 11:32:12 [ERROR] Error reading packet from server: bogus data in log event
(server_errno=1236)
071008 11:32:12 [ERROR] Got fatal error 1236: 'bogus data in log event' from master when
reading data from binary log
071008 11:32:12 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for
the right syntax to use near '[^GG^E^M' at line 4' on query. Default database: 'fp12_mp'.
Query: 'UPDATE `event` SET
`status` = 'processing',
`worker_id` = '4',
h[^GG^E^M', Error_code: 1064
071008 11:32:12 [ERROR] Slave I/O thread exiting, read up to log
'qa-dbmaster-bin.000003', position 656765214
071008 11:32:12 [ERROR] Error running query, slave SQL thread aborted. Fix the problem,
and restart the slave SQL thread with "SLAVE START". We
stopped at log 'qa-dbmaster-bin.000003' position 656764771
On master the error log looked like this:
070914 12:50:06 mysqld started
/usr/sbin/mysqld: ready for connections.
Version: '4.1.12-standard-log' socket: '/tmp/mysql.sock4' port: 3306 MySQL Community
Edition - Standard (GPL)
071006 11:54:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waiting for someone to free space... R
etry in 60 secs
071006 12:04:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waiting for someone to free space... R
etry in 60 secs
071006 12:14:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waiting for someone to free space... R
etry in 60 secs
071006 12:24:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003'
(Errcode: 28). Waitin071017 17:16:25 [Note] /usr/sbin
/mysqld: Normal shutdown
071017 17:16:27 [Note] /usr/sbin/mysqld: Shutdown complete
071017 17:16:27 mysqld ended
You can see that the server could not write into the error_log anymore as well.
So the binary log got corrupted and the slave could not recover replication after freeing
the disk on the master.
I think that mysqld wrote to the binary log an event and it did not get fully written
into the log.
Then with the next event, the OS reports the disk full and mysqld will retry writing
then. When room is available again, event is just appended to binlog after the incomplete
previous event.
How to repeat:
See description.
Suggested fix:
Let mysqld check for enough space in the binary log before writing into it.
If there is not enough space, postpone writing until there is room again.
Description: A disk full occurred on the master, then mysqld on the slave reported in the .err log: 070914 12:50:06 mysqld started /usr/sbin/mysqld: ready for connections. Version: '4.1.12-standard-log' socket: '/tmp/mysql.sock4' port: 3306 MySQL Community Edition - Standard (GPL) 070914 12:50:07 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './qa-dbslave1-relay-bin.0000 01' position: 4 070914 12:50:07 [Note] Slave I/O thread: connected to master 'replicate@10.32.68.13:3306', replication started in log 'FIRST' at position 4 071006 8:29:33 [ERROR] Error reading packet from server: binlog truncated in the middle of event (server_errno=1236) 071006 8:29:33 [ERROR] Got fatal error 1236: 'binlog truncated in the middle of event' from master when reading data from binary log 071006 8:29:33 [ERROR] Slave I/O thread exiting, read up to log 'qa-dbmaster-bin.000003', position 656764771 071008 11:32:12 [Note] Slave I/O thread: connected to master 'replicate@10.32.68.13:3306', replication started in log 'qa-dbmaster-bin.000003' at position 656764771 071008 11:32:12 [ERROR] Error reading packet from server: bogus data in log event (server_errno=1236) 071008 11:32:12 [ERROR] Got fatal error 1236: 'bogus data in log event' from master when reading data from binary log 071008 11:32:12 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[^GG^E^M' at line 4' on query. Default database: 'fp12_mp'. Query: 'UPDATE `event` SET `status` = 'processing', `worker_id` = '4', h[^GG^E^M', Error_code: 1064 071008 11:32:12 [ERROR] Slave I/O thread exiting, read up to log 'qa-dbmaster-bin.000003', position 656765214 071008 11:32:12 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'qa-dbmaster-bin.000003' position 656764771 On master the error log looked like this: 070914 12:50:06 mysqld started /usr/sbin/mysqld: ready for connections. Version: '4.1.12-standard-log' socket: '/tmp/mysql.sock4' port: 3306 MySQL Community Edition - Standard (GPL) 071006 11:54:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003' (Errcode: 28). Waiting for someone to free space... R etry in 60 secs 071006 12:04:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003' (Errcode: 28). Waiting for someone to free space... R etry in 60 secs 071006 12:14:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003' (Errcode: 28). Waiting for someone to free space... R etry in 60 secs 071006 12:24:48 [ERROR] /usr/sbin/mysqld: Disk is full writing './qa-dbmaster-bin.000003' (Errcode: 28). Waitin071017 17:16:25 [Note] /usr/sbin /mysqld: Normal shutdown 071017 17:16:27 [Note] /usr/sbin/mysqld: Shutdown complete 071017 17:16:27 mysqld ended You can see that the server could not write into the error_log anymore as well. So the binary log got corrupted and the slave could not recover replication after freeing the disk on the master. I think that mysqld wrote to the binary log an event and it did not get fully written into the log. Then with the next event, the OS reports the disk full and mysqld will retry writing then. When room is available again, event is just appended to binlog after the incomplete previous event. How to repeat: See description. Suggested fix: Let mysqld check for enough space in the binary log before writing into it. If there is not enough space, postpone writing until there is room again.