Bug #41662 corruption of relay logs
Submitted: 21 Dec 2008 4:15 Modified: 22 Dec 2008 9:45
Reporter: Gerald Nowitzky Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.72 OS:Linux (gentoo )
Assigned to: CPU Architecture:Any

[21 Dec 2008 4:15] Gerald Nowitzky
Description:
As I can't reopen a bug, I open a new one. bug #26489 doesn't seem to be fixed. I still see the slave stopping with the same symptoms in the mysqld log:

081220  2:40:05 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
081220  2:40:05 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysqld-db1n-bin.000951' position 994185738
081220  2:40:05 [Note] Slave: connected to master 'repli@db1n:3306',replication resumed in log 'mysqld-db1n-bin.000951' at position 994185738

and then:

081220  3:04:55 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '5' at line 3' on query. ---
081220  3:04:55 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysqld-db1n-bin.000951' position 994184266

The occurence of this bug in my case doesn't have anything to do with network outages. I see it on a Server at a time where this server has very high load (cpu and IO) due to a multi-threaded java-process.

How to repeat:
run replication under heavy IO/CPU load on the slave

Suggested fix:
rework patch on bug #26489
[22 Dec 2008 6:24] Sveta Smirnova
Thank you for the report.

Did you upgrade both master and slave?
[22 Dec 2008 9:23] Gerald Nowitzky
Only the slave is updated to 5.0.72. The master is 5.0.51. However, the master logs are intact, as only one slave fails at a time and can be repaired with set_master_log_pos=xxxx.
[22 Dec 2008 9:45] Sveta Smirnova
Thank you for the feedback.

Fix for bug #26489 is in the part of replication that runs on the master. So you must upgrade master to get fix of this bug.

I set this report as "Not a Bug" because so, feel free to reopen it if you meet same problem after upgrading master.