Bug #35785 Replication stops after upgrading to 5.1.23
Submitted: 3 Apr 2008 7:32 Modified: 9 May 2008 10:19
Reporter: Stefan Hinz Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.1.23 OS:Any
Assigned to: Jon Stephens CPU Architecture:Any

[3 Apr 2008 7:32] Stefan Hinz
Description:
After upgrading from 5.1.22 to 5.1.23, replication stopped for all my slave servers. This looks like a bug that could be a regression of Bug#28597 which has been fixed some time ago. SHOW SLAVE STATUS gives me this:

Slave_IO_State: Waiting to reconnect after a failed master event read
 Master_Host: themis
...
 Last_IO_Errno: 2013
 Last_IO_Error: error reconnecting to master 'replikator@themis:3306' - retry-time: 60  retries: 86400
 Last_SQL_Errno: 0
 Last_SQL_Error:

On the slave hosts, the error logs end like this:

...
080324  1:39:05 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'FIRST' at postion 4
080324  1:39:05 [ERROR] Error reading packet from server: File './themis-bin.000001' not found (Errcode: 2) ( server_errno=29)
080324  1:40:05 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'FIRST' at postion 4
080324  1:40:05 [ERROR] Error reading packet from server: File './themis-bin.000001' not found (Errcode: 2) ( server_errno=29)
080324  1:41:05 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'FIRST' at postion 4
080324  1:41:05 [ERROR] Error reading packet from server: File './themis-bin.000001' not found (Errcode: 2) ( server_errno=29)

On the master host (themis), the error log ends like this:

...
080324  1:37:05 [ERROR] Failed to open log (file './themis-bin.000001', errno 2)
080324  1:38:05 [ERROR] Failed to open log (file './themis-bin.000001', errno 2)
080324  1:39:05 [ERROR] Failed to open log (file './themis-bin.000001', errno 2)
080324  1:40:05 [ERROR] Failed to open log (file './themis-bin.000001', errno 2)

On the master host, the themis-bin.index file ends like this:

...
/var/lib/mysql/themis-bin.000118
/var/lib/mysql/themis-bin.000119
./themis-bin.000120
./themis-bin.000121
./themis-bin.000122
./themis-bin.000123
./themis-bin.000124

Looks like someone has changed the format from absolute to relative paths.

What puzzles me is that the error log entry indicates that the slaves are trying to read the master's *very first* binlog file; I've removed many old binlog files a long time ago with no issues so far. Is that an unrelated bug?

How to repeat:
Upgrade a master-slave replication setup from 5.1.22 to 5.1.23 and see replication slaves stop replicating.

Suggested fix:
Looks like the format of the <host>-bin.index file entries should be changed back from relative to absolute path names.
[3 Apr 2008 20:51] Sveta Smirnova
Thank you for the report.

This incompatibility is result of bug #28597 and fix of it.

Before bug #28597 was introduced binlog index file contained relative paths like

./themis-bin.000074

In time when bug #28597 existed binlog index file containde absolute paths like:

/var/lib/mysql/themis-bin.000076

Bug #28597 was fixed in version 5.1.23, so binlog index file contains relative paths again.

So behavior is correct.

But manual does not have an alert about upgrade between these versions could lead to replication broke. So I set category of the report to "Documentation"
[3 Apr 2008 21:08] Stefan Hinz
Thanks for turning this into a docs problem! ;-) However, I can't see what the fix is supposed to be. Or should we just state in the docs that you shouldn't upgrade because this will break replication? I don't think so. So please indicate what the docs should say to avoid the problem.
[3 Apr 2008 21:29] Sveta Smirnova
I think docs should alert:

upgrade master from earlier versions to 5.1.18-5.1.22 can lead to creation of invalid binary log index file

upgrade master from versions 5.1.18-5.1.22 to 5.1.23 can lead to creation of invalid binary log index file

Same for 5.0 and 6.0 branches.

Maybe some workaround like "use log-bin=explicit_name" or "reset master" should be indicated.
[6 Apr 2008 7:31] Stefan Hinz
Maybe mention a workaround?? Because if there's no workaround I wouldn't consider this a documentation bug. Think of the consequences if there's no workaround: Users with a replication setup effectively can't update MySQL!
[7 Apr 2008 17:54] Sveta Smirnova
I am sorry, missed your workaround from bug #28597.

(Workaround: After manually changing the last line recorded by the "old" (5.1.16) server
from:

./themis-bin.000075

to:

/var/lib/mysql/themis-bin.000075

and restarting the slave server, replication resumes with no issues.)

So for this bug one should reverse execute this action: rename /var/lib/mysql/themis-bin.000075 to themis-bin.000075
[7 Apr 2008 22:45] Stefan Hinz
The workaround I had indicated for the old bug doesn't work. Indeed, that was the first thing I tried, but to no avail.

See also my original remark about another (unrelated?) bug. From looking at the error look, I can see that the slaves are trying to read the *very first* binlog file from the master (that's gone away a long time ago). I can't say whether or not that's the reason why the workaround doesn't work this time but I think it is.

I'm afraid we need a different workaround this time - please indicate what that should be.
[8 Apr 2008 6:26] Jon Stephens
Stefan,

Did you remove the references to the log files that don't actually exist any longer from the index file?

Also, I'm not sure whether or not this is applicable, but you might want to look at http://dev.mysql.com/doc/refman/5.1/en/purge-master-logs.html - especially the last paragraph, added just a couple of days ago, and the bugs referenced therein (Bug #18199, Bug #18453) .
[25 Apr 2008 19:43] Stefan Hinz
I couldn't make replication start again while I was on MySQL 5.1.23. Upgrading to 5.1.24 magically solved the problem: Without having done anything (except upgrading using MySQL's RPM files) replication started again, and slaves caught up to the master's current position as if the problem had never existed. I'm stunned.