Bug #41400 | slave fails to reconnect on errors | ||
---|---|---|---|
Submitted: | 11 Dec 2008 15:53 | Modified: | 31 May 2009 6:07 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.0.67 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | reconnect, replication, slave |
[11 Dec 2008 15:53]
Mark Callaghan
[11 Dec 2008 16:54]
Sveta Smirnova
Thank you for the report.
[12 Dec 2008 21:27]
Andrei Elkin
After talking to Sinisa we came to consensus, sinisa said: errors could occur due to network problems ... how about trying restart after a sleep(). Indeed, the 2nd of the mentioned functions register_slave_on_master() can return with an error of transient character allowing to restart automatically upon a timeout. Wrt other two sub-issues of the description: 1. errors of get_master_version_and_clock() are all of a critical character and the slave can not restart. 2. the misleading comment has been removed in 5.1.
[12 Dec 2008 21:42]
Mark Callaghan
Some errors from get_master_version_and_clock are transient and may be caused by a flaky network. That function runs several queries on the master. Any of them can fail because of a flaky network. Reconnect must be retried in that case.
[22 May 2009 11:03]
Zhenxing He
Hi, I think this problem does not exist in 5.1+. 1) get_master_version_and_clock() only returns 1 when the queries are successful but the result of the queries are not as expected. So it should not suffer from any flaky network problems 2) the problem with register_slave_on_master() has already been fixed by BUG#29976. If no objection, I'd like to mark this bug as a dup of BUG#29976. Please provide feedback if you disagree, thanks!
[25 May 2009 9:33]
Zhenxing He
Dup of BUG#29976
[25 May 2009 14:36]
Mark Callaghan
I am not fond of the fix for get_master_version_and_clock. get_master_version_and_clock doesn't report an error when queries on the master fail. Instead it makes up values to use or doesn't do the error checks. If that is OK to do, then why not get rid of this code?
[26 May 2009 3:21]
Zhenxing He
Hi Mark, I agree that get_master_version_and_clock should not ignore errors of queries. But since this issue is different from what this bug report originally reported, I'd like to open a new bug to handle this problem, is that OK?
[31 May 2009 6:07]
Zhenxing He
Handle the get_master_version_and_clock problem by Bug#45214, close this bug as dup