Bug #887 Bad packet between master and slave when master shuts down; corrupts relay log
Submitted: 22 Jul 2003 7:15 Modified: 27 Aug 2003 12:18
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.1 OS:Any (all)
Assigned to: Alexey Botchkov CPU Architecture:Any

[22 Jul 2003 7:15] Guilhem Bichot
Description:
030722 13:22:06  Slave SQL thread: I/O error reading event(errno: 22  cur_log->error: 4)
030722 13:22:06  Error reading relay log event: Aborting slave SQL thread because of partial event read
030722 13:22:06  Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (
you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

Which is a relay log corruption: it happens when one shuts down the master. At that moment, the master writes a STOP event to the binlog. Then the binlog_dump thread sends it to the slave, and apparently it does not work well. Maybe something to do with send_eof() in mysql_binlog_send(): it is different in 4.0 (where a packet of length 1 is sent) and 4.1 (where a packet of length 5 is sent). Relay log corruption results from the write of the 4 (5-1) bytes to the relay log. 
Looks like the slave does not detect 5-byte packets as "eof":
in read_event() in 4.1 (and in 4.0) we have:

  if (len == 1)
  {
     sql_print_error("Slave: received 0 length packet from server, apparent\
 master shutdown: %s",
		     mysql_error(mysql));
     return packet_error;
  }
which works for 1-byte packets (4.0 protocol) but not 5-byte ones (4.1 protocol).

Strange, because it did not happen a few weeks ago.

How to repeat:
Start replication, then shut down the master.

Suggested fix:
I'll work on this.
[22 Jul 2003 7:45] Guilhem Bichot
Passing it to Holyfoot.
Indeed, in 4.1 few weeks ago the slave did not use the 4.1 protocol:
I have tested and it had client capabilities 32901 (which does not contain
16384, which meant CLIENT_PROTOCOL_41 at that time).
In 4.1 today, the slave uses the 4.1 protocol, it has client capabilities
2147525125, which contain 512, which means CLIENT_PROTOCOL_41 now.
2147525125 is
 (CLIENT_LONG_PASSWORD | CLIENT_LONG_FLAG |	  \
                             CLIENT_SECURE_CONNECTION | CLIENT_TRANSACTIONS | \
			     CLIENT_PROTOCOL_41 | CLIENT_SECURE_CONNECTION)
| CLIENT_REMEMBER_OPTIONS,
i.e. without CLIENT_REMEMBER_OPTIONS it is exactly what is
in sql/client_settings.h, file created by Holyfoot.
So it is *possible* that this bug was introduced when changing from mini_client.cc.
[25 Jul 2003 14:03] Guilhem Bichot
A suggested fix could be to take inspiration from Victor's recent changes to mysqlbinlog 4.0:
he changed the EOF test in dump_remote_log_entries()
from
if (len == 1 && net->read_pos[0] == 254)
to
if (len < 8 && net->read_pos[0] == 254)
See ChangeSet@1.1455.61.1, 2003-07-02 16:56:27-04:00.
[27 Aug 2003 12:18] Michael Widenius
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Code fixed as requested