Bug #13023 | With --log-slave-updates Exec_master_log_pos of SQL thread lags IO | ||
---|---|---|---|
Submitted: | 6 Sep 2005 22:23 | Modified: | 18 Oct 2005 9:36 |
Reporter: | John Chung | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 4.1.12-standard for ibm-linux-gnu on s39 | OS: | Linux (SUSE Linux 9) |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
[6 Sep 2005 22:23]
John Chung
[6 Sep 2005 22:28]
John Chung
Forgot to mention something important. During my testing, IO and SQL thread was never down. Nor, there was any error logged in mysql error log.
[14 Sep 2005 19:19]
John Chung
Although SQL and IO threads are up and running, SQL Thread does not move forward.
[10 Oct 2005 21:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/30878
[11 Oct 2005 7:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/30899
[12 Oct 2005 11:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/30955
[12 Oct 2005 22:40]
Guilhem Bichot
fixed in 5.0.15 and 4.1.16: ChangeSet@1.2473, 2005-10-12 22:29:36+02:00, guilhem@mysql.com To force a restart at the end of test, the option file must be non-empty, it's not enough if it exists and is empty. and ChangeSet@1.2471, 2005-10-12 13:29:55+02:00, guilhem@mysql.com Fix for BUG#13023: "SQL Thread is up but doesn't move forward". Details in slave.cc;
[12 Oct 2005 22:48]
Guilhem Bichot
info: the bug was that in a *circular replication* A->B->C->...->A (with possibly C not existing), *IF* B had --log-slave-updates, then SHOW SLAVE STATUS on A could report an Exec_master_log_pos smaller than the binlog position displayed by SHOW MASTER STATUS on B even though A was fully caught up in terms of data changes. This happened if the last thing which had occured on the A&B setup is an update on A. The reason was that an UPDATE on A goes to A's binlog, then to B, into B's binlog (because of log-slave-updates) then is sent back to A. A's slave I/O thread ignores the event (because it recognizes it comes from itself in fact, using server ids comparison), and so does not write it to A's relay log, so A's slave SQL thread will never know about this event. So Exec_master_log_pos (as it's maintained by the SQL thread) will never take this event into account. While this event is of course taken into account by SHOW MASTER STATUS on B as it's in B's binlog. Hence the position mismatch. A test mysql-test/t/rpl_dual_pos_advance.test has this scenario if needed.
[13 Oct 2005 2:03]
Mike Hillyer
Documented in 4.1.16 and 5.0.15 changelogs: <listitem> <para> SQL thread could hang in multi-master circular replication. (Bug #13023) </para> </listitem>
[13 Oct 2005 6:58]
Guilhem Bichot
Hi Mike, No, no, it's really like I wrote before, the symptom is a too low Exec_master_log_pos. Not a hang. The test I pushed was designed to hang in this case, because that was an easy way to test (it hans because it does a MASTER_POS_WAIT() of a position which never advances, thus the client thread which issues MASTER_POS_WAIT() hangs but not the slave SQL or I/O threads). Please could you change the changelog entry to rather insist on the wrong Exec_master_log_pos (that's the reason the customer reported the bug).
[13 Oct 2005 22:17]
James Day
Mike, New title for this bug. Just changed from: SQL Thread is up but doesn't move forward to: With --log-slave-updates Exec_master_log_pos of SQL thread lags IO [thread in circular replication] Removing the words "multi-master" is to eliminate the chance for confusion with multiple masters for one server, at the request of the original bug reporter. This revised wording might do well for the changelog. New bug 13861 covers the unusual case where START SLAVE UNTIL is used, not resolved by this patch.