MySQL Bugs: #1014: Exec_master_log_pos *occasionally* returns wrong value in cyclical replication

Bug #1014	Exec_master_log_pos occasionally returns wrong value in cyclical replication
Submitted:	8 Aug 2003 0:18	Modified:	21 Aug 2003 7:03
Reporter:	Nathan Tanner	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	4.0.14	OS:	Linux (RedHat 9.0/Windows2000)
Assigned to:	Guilhem Bichot	CPU Architecture:	Any

Description:
In a cyclical replication setup, the "Exec_master_log_pos" in a SHOW SLAVE STATUS command on the SLAVE apparently returns the Exec_master_log_pos value from its MASTER (which is also the SLAVE's SLAVE).

This seems to only occur when there is no replication activity, ie. no updates being done on either master.

How to repeat:
Set up two MySQL replication servers, A (Windows2000) and B (RedHat9.0). Set the MASTER of A to 'B' and the MASTER of B to 'A'.

Run a SHOW SLAVE STATUS on 'A':

mysql> show slave status\G
*************************** 1. row ***************************
          Master_Host: B
          Master_User: *******
          Master_Port: 3306
        Connect_retry: 60
      Master_Log_File: MyBINDr-bin.009
  Read_Master_Log_Pos: 11861518
       Relay_Log_File: VRS-relay-bin.003
        Relay_Log_Pos: 3867906
Relay_Master_Log_File: MyBINDr-bin.009
     Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
      Replicate_do_db:
  Replicate_ignore_db:
           Last_errno: 0
           Last_error:
         Skip_counter: 0
  Exec_master_log_pos: 11861518
      Relay_log_space: 3867803
1 row in set (0.17 sec)

Do the same on 'B':

mysql> show slave status\G
*************************** 1. row ***************************
          Master_Host: A
          Master_User: *****
          Master_Port: 3306
        Connect_retry: 60
      Master_Log_File: VRS-bin.007
  Read_Master_Log_Pos: 3892546
       Relay_Log_File: MyBINDr-relay-bin.001
        Relay_Log_Pos: 3410172
Relay_Master_Log_File: VRS-bin.007
     Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
      Replicate_do_db: 
  Replicate_ignore_db: 
           Last_errno: 0
           Last_error: 
         Skip_counter: 0
  Exec_master_log_pos: 11861518
      Relay_log_space: 3410172
1 row in set (0.00 sec)

Notice that in both cases the Exec_master_log_pos is the same. Also notice that the SLAVE STATUS of B has Read_Master_Log_Pos: 3892546 (3,892,546) and Exec_master_log_pos: 11861518 (11,861,518), which isn't possible... B couldn't have read 8 million bytes past the end of the Master Log File.

The Exec_master_log_pos variable will show a correct value as soon as an update is done on server 'A':

(After an update is done on 'A')

mysql> show slave status\G
*************************** 1. row ***************************
          Master_Host: 66.255.18.66
          Master_User: bindr
          Master_Port: 3306
        Connect_retry: 60
      Master_Log_File: VRS-bin.007
  Read_Master_Log_Pos: 3892823
       Relay_Log_File: MyBINDr-relay-bin.001
        Relay_Log_Pos: 3410449
Relay_Master_Log_File: VRS-bin.007
     Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
      Replicate_do_db: 
  Replicate_ignore_db: 
           Last_errno: 0
           Last_error: 
         Skip_counter: 0
  Exec_master_log_pos: 3892823
      Relay_log_space: 3410449
1 row in set (0.00 sec)

Suggested fix:
Make sure Exec_master_log_pos is the actual status of the SLAVE, even if it is the master of its own master.

Hi,
Thank you for this excellent first information.
A crucial question: does any of your two slaves run with option skip-slave-updates?
You can know this by doing
show variables like 'log_slave_updates';
on both slaves. You should get one of these answers: ON or OFF.

Yes, both servers have the 'log-slave-updates' option 'ON'... otherwise, cyclical replication would not occur. Also, the server-id variable for each is completely different.

In case you are wondering about 'slave-skip-errors', this option is *not* set for either.

Well, if there are only *2* servers (like in your A-B example), log-slave-updates is not necessary, it's rather a waste of CPU and disk.
log-slave-updates means you want to propagate the update run on a slave by the slave threads to another machine. But if you only have 2 machines, this is useless, because the update run on a slave by the slave threads comes from the master of this slave, so no need to propagate it back to the master, which will just ignore it.
log-slave-updates is necessary when you have at least 3 servers doing circular replication is a ring. And in that case, yes, you can sometimes get unexpected values in SHOW SLAVE STATUS (unexpected, but very sensible: it means the slave has seen an event originating from himself, which was originally at position xxx in this slave's binlog).
If you have only 2 servers you should probably not use log-slave-updates.

Thank you, I remember now that I originally *did* have three servers in the replication cycle so I needed log-slave-updates = ON.

So am I correct in saying the the Exec_master_log_pos actually returns the Orig_log_pos from the Master's binlog (such as when a SHOW BINLOG EVENTS is run on the Master), not the Pos value? The reason that this is an issue for me is because I am building an application that will switch from one master to another. However, to do this I need to be able to store an Exec_master_log_pos value that I can reliably use later on in a CHANGE MASTER TO statement. I see how I can work around this in a cyclical setup with two servers, but three may pose a problem.

I am sorry, this still seems like a bug (the following are actual values):

Show Slave Status on Server B returns the following (note the Exec_master_log_pos value):
mysql> show slave status\G
*************************** 1. row ***************************
          Master_Host: ServerA
          Master_User: ******
          Master_Port: 3306
        Connect_retry: 60
      Master_Log_File: VRS-bin.007
  Read_Master_Log_Pos: 5703985
       Relay_Log_File: MyBINDr-relay-bin.001
        Relay_Log_Pos: 5221611
Relay_Master_Log_File: VRS-bin.007
     Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
      Replicate_do_db: 
  Replicate_ignore_db: 
           Last_errno: 0
           Last_error: 
         Skip_counter: 0
  Exec_master_log_pos: 13647535
      Relay_log_space: 5221611
1 row in set (0.00 sec)

On Server A (server-id 10653) we have this binlog event:
| ServerA-bin.007 | 5703985 | Query      | 10653     | 5703985

On Server B (server-id 999999) we have the same binlog event (because log-slave-updates is on):
| ServerB-bin.009 | 13647535 | Query      | 10653     | 5703985      |

Server B shouldn't care about its own binlog events when it is displaying its SLAVE STATUS. It should display 5703985 for it's Exec_master_log_pos, because that is really where it has executed up to in its master. This binlog event isn't even an event that originated from Server B... it's an event originating from Server A that gets LOGGED in Server B's binlogs, and for some reason the position in Server B's binlog is being reported in Exec_master_log_pos, when it should really be Server A's binlog position that gets reported. Can this bug report be reopened, or do I need to provide more info?

> Show Slave Status on Server B returns the following (note the
> Exec_master_log_pos value):
> mysql> show slave status\G
> *************************** 1. row ***************************
>           Master_Host: ServerA
>           Master_User: ******
>           Master_Port: 3306
>         Connect_retry: 60
>       Master_Log_File: VRS-bin.007
>   Read_Master_Log_Pos: 5703985
>        Relay_Log_File: MyBINDr-relay-bin.001
>         Relay_Log_Pos: 5221611
> Relay_Master_Log_File: VRS-bin.007
>      Slave_IO_Running: Yes
>     Slave_SQL_Running: Yes
>       Replicate_do_db: 
>   Replicate_ignore_db: 
>            Last_errno: 0
>            Last_error: 
>          Skip_counter: 0
>   Exec_master_log_pos: 13647535
>       Relay_log_space: 5221611
> 1 row in set (0.00 sec)
> 
> On Server A (server-id 10653) we have this binlog event:
> | ServerA-bin.007 | 5703985 | Query      | 10653     | 5703985
> 
> On Server B (server-id 999999) we have the same binlog event (because
> log-slave-updates is on):
> | ServerB-bin.009 | 13647535 | Query      | 10653     | 5703985      |
> 
> Server B shouldn't care about its own binlog events when it is
> displaying its SLAVE STATUS. It should display 5703985 for it's
> Exec_master_log_pos, because that is really where it has executed up to
> in its master. This binlog event isn't even an event that originated
> from Server B... it's an event originating from Server A that gets
> LOGGED in Server B's binlogs, and for some reason the position in
> Server B's binlog is being reported in Exec_master_log_pos, when it
> should really be Server A's binlog position that gets reported. Can
> this bug report be reopened, or do I need to provide more info?

The event has position 13647535 in B's binlog, and Exec_master_log_pos is 13647535 on B. Which means the slave SQL thread on B has reached the position JUST BEFORE the event we're talking about. It has not executed the event we're talking about, yet. The event which caused 13647535 to be in Exec_master_log_pos, is the event before that one, and is likely to originate from server B (then it goes to B's binlog, then to A, then it goes to A's binlog, then it goes back to B and B's SQL slave thread reads it and ignores it (though it modifies Exec_master_log_pos to reflect that B's SQL slave thread has read it).

This will disappear in MySQL 4.1.1: in that version, filtering (based on the server id of the event, to know if the event originates from ourselves) is done in the I/O thread, not in the SQL thread, so events will be ignored sooner in the process; so they will not affect Exec_master_log_pos.

Hi!
Good news: what you reported, which is not a bug, can still be considered as an annoyance. So we have just fixed it in MySQL 4.0.15.
That is, with 2 or 3 machines running with --log-slave-updates, you won't see unexpected values of Exec_master_log_pos anymore. You will always see, on machine B in Exec_master_log_pos, the position of the last executed event received from A (received from A, but could be that it originally came from B if you use --log-slave-updates; still in this case the position will be the position in A's binlog).
I hope this will help you for your synchronization & master-switch application.

Thank you, much appreciated!!!