MySQL Bugs: #16835: I/O thread between master and Slave stopped giving error

Bug #16835	I/O thread between master and Slave stopped giving error
Submitted:	27 Jan 2006 10:15	Modified:	20 Mar 2006 12:31
Reporter:	Sumit Kumar Roy	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	4.0.17	OS:	Linux (Linux Kernal 2.24)
Assigned to:		CPU Architecture:	Any

Description:
Hi,
Weh have a HA system where two instance of mysql server runs in master and slave mode. After a crash (of the node) the slave came up properly and started replication. After some time it gave an error "Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted"
Below is the complete log of the Slave DB.

060124 16:12:04  Slave I/O thread: connected to master 'arcslv@cp_dbsrv:3356',  replication started in log 'db1_log_bin.016' at position 9615
060124 16:12:04  Slave SQL thread initialized, starting replication in log 'db1_log_bin.016' at position 9522, relay log '/usr/local/archivedb/data_s/standby_rlog.011' position: 1954514
060124 22:27:51  InnoDB: Database was not shut down normally.
InnoDB: Starting recovery from log files...
InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 801109722
InnoDB: Doing recovery: scanned up to log sequence number 0 806352384
InnoDB: Doing recovery: scanned up to log sequence number 0 807159557
060124 22:27:53  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 181536483, file name db1_log_bin.018
InnoDB: Last MySQL binlog file position 0 181536357, file name /usr/local/archivedb/log_s/db2_log_bin.018
060124 22:27:56  InnoDB: Flushing modified pages from the buffer pool...
060124 22:27:57  InnoDB: Started
060124 22:27:58  Slave I/O thread: connected to master 'arcslv@cp_dbsrv:3356',  replication started in log 'db1_log_bin.018' at position 257816943
060124 22:27:58  Slave SQL thread initialized, starting replication in log 'db1_log_bin.018' at position 181536295, relay log '/usr/local/archivedb/data_s/standby_rlog.050' position: 79835
/usr/sbin/mysqld: ready for connections.
Version: '4.0.17-standard-log'  socket: '/usr/local/archivedb/socks/slave.sock2'  port: 3357
060124 22:31:12  Error reading packet from server:  (server_errno=1158)
060124 22:31:12  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db1_log_bin.018' position 265509146
060124 22:31:12  Slave: connected to master 'arcslv@cp_dbsrv:3356',replication resumed in log 'db1_log_bin.018' at position 265509146
060124 22:33:59  Error reading packet from server:  (server_errno=1158)
060124 22:33:59  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db1_log_bin.018' position 273882629
060124 22:33:59  Slave: connected to master 'arcslv@cp_dbsrv:3356',replication resumed in log 'db1_log_bin.018' at position 273882629
060124 22:36:22  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
060124 22:36:22  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db1_log_bin.019' position 8208106
060124 22:36:22  Slave: connected to master 'arcslv@cp_dbsrv:3356',replication resumed in log 'db1_log_bin.019' at position 8208106
060124 22:38:44  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
060124 22:38:44  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db1_log_bin.019' position 16636428
060124 22:38:44  Slave: connected to master 'arcslv@cp_dbsrv:3356',replication resumed in log 'db1_log_bin.019' at position 16636428
060124 22:51:15  Error in Log_event::read_log_event(): 'read error', data_len: 152, event_type: 2
060124 22:51:15  Error reading relay log event: slave SQL thread aborted because of I/O error
060124 22:51:15  Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
060124 22:51:15  Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db1_log_bin.018' position 257813975
060124 22:51:15  Slave I/O thread killed while reading event
060124 22:51:15  Slave I/O thread exiting, read up to log 'db1_log_bin.020', position 871715

How to repeat:
described above.

Suggested fix:
None

Thank you for a problem report. Why do you think it is a bug in MySQL code? As you may read in the log, there are many possible reasons for such a failure, including hardware problems. Have you tried to run mysqlbinlog on master's
binary logand/ or slave's relay log? If you did, what are the results?

Can you provide a simple set of steps/SQL statements to reproduce the problem?

I have checked /var/log/messages. There were no abnormalities. Also I checked ‘db1_log_bin.018', it was OK. We use external RAID disks to store the data. The RAID logs are also OK. Actually that time a DB backup was initiated. The Slave was asked to synchronized with the master with "select master_pos_wait()" command. The Slave DB came out of the synchronization process at "060124 22:51:15" prematurely, and the backup was not proper. We take backup from the Slave DB, so that the Master DB is not affected and I/O can go on properly. After doing a restoration from the improper Backup, we have been reported a loss of data.
During the stopping of the I/O thread atleast 30 sessions were active and all were executing I/O on the DB. So it is nearly impossible for me to give you a SQL statement to reproduce the problem.

Check db1_log_bin.019 with mysqlbinlog, please. Inform about the results.

It is surely impossible to reproduce a kind of a system you have here, so to call this problem a bug we need to figure out a small and exact sequence of steps to get the same error. By the way, do you have any plans to upgrade to 4.0.26 at least?

Hi,
Sorry for the delay. Our client has obtained the licence for 4.0.17. So it depends on our client whether they want to upgrade or not. But till then we need to find an solution for this problem.

Please, run mysqlbinlog db1_log_bin.019 and upload the results (at least, the last part of the results).

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".