Bug #27037 SHOW MASTER STATUS not returning correct position in 5.0
Submitted: 11 Mar 2007 22:49 Modified: 15 May 2007 5:30
Reporter: Arjen Lentz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.38-BK, 5.0.36 OS:Linux (Linux, any)
Assigned to: Mats Kindahl CPU Architecture:Any
Tags: bfsm_2007_03_15, position, regression, replication, show master status

[11 Mar 2007 22:49] Arjen Lentz
Description:
LOCK ALL TABLES
SHOW SLAVE STATUS
Sometimes reports incorrect position. This can be verified by checking binlog.
In some cases extra data is included, in others data is missing.

Always works correctly in 4.1 but often not in 5.0.
Possibly an issue with transactions only (not certain).

How to repeat:
Example of case where data is missing:

# at 58853196
#070309 13:24:28 server id 893 end_log_pos 58853180 Query thread_id=186048 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
BEGIN;
# at 58853274
#070309 13:24:28 server id 893 end_log_pos 102 Query thread_id=186048 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
insert into foo values (1557);
# at 58853376
#070309 13:24:28 server id 893 end_log_pos 205 Query thread_id=186048 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
insert into foo2 values (1557);
# at 58853479
#070309 13:24:28 server id 893 end_log_pos 58853412 Xid = 120759241
COMMIT;
# at 58853506
#070309 13:24:28 server id 893 end_log_pos 58853490 Query thread_id=186049 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
BEGIN;
# at 58853584
#070309 13:24:28 server id 893 end_log_pos 102 Query thread_id=186049 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
insert into foo values (1558);
# at 58853686
#070309 13:24:28 server id 893 end_log_pos 205 Query thread_id=186049 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
insert into foo2 values (1558);
# at 58853789
#070309 13:24:28 server id 893 end_log_pos 58853722 Xid = 120759246
COMMIT;
# at 58853816
#070309 13:24:29 server id 893 end_log_pos 58853800 Query thread_id=186051 exec_time=0 error_code=0
SET TIMESTAMP=1173464669;
BEGIN;
# at 58853894
#070309 13:24:28 server id 893 end_log_pos 102 Query thread_id=186051 exec_time=1 error_code=0
SET TIMESTAMP=1173464668;
insert into foo values (1559);
# at 58853996
#070309 13:24:29 server id 893 end_log_pos 205 Query thread_id=186051 exec_time=0 error_code=0
SET TIMESTAMP=1173464669;
insert into foo2 values (1559);
# at 58854099
#070309 13:24:29 server id 893 end_log_pos 58854032 Xid = 120759258
COMMIT;
# at 58854126

After locking the tables "show master status" returns position 58853722. When I run mysqlbinlog with --stop-positon=58853722 here is the output at the end of the file:

# at 58853584
#070309 13:24:28 server id 893 end_log_pos 102 Query thread_id=186049 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
insert into foo values (1558);
# at 58853686
#070309 13:24:28 server id 893 end_log_pos 205 Query thread_id=186049 exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
insert into foo2 values (1558);
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

So position 58853722 is just before the commit so the commit isn't added and the transactions is rolled back.

Here is the commit transaction:

# at 58853789
#070309 13:24:28 server id 893 end_log_pos 58853722 Xid = 120759246
COMMIT;

so "show master status" returns 58853722 from the end_log_pos above but it probably should be returning 58853789 which is the actual position.

This explains where I'm getting fewer rows, but not when I get extra rows so that must be a different bug or the same bug manifested in a different way.
[11 Mar 2007 23:53] Arjen Lentz
Could this somehow be related to http://bugs.mysql.com/23890 ?
(mysqlbinlog outputs COMMIT unnecessarily when single database is used)
[12 Mar 2007 22:37] Arjen Lentz
*REGRESSION* bug does not occur in 4.1
[13 Mar 2007 22:08] Arjen Lentz
Could this be related to the 5.0 changes for XA ?
[13 Mar 2007 22:11] Mark Leith
Could this be related to the changes we made in binlogging towards XA/xid? The fact it works in 4.1 binlogs and not in 5.0 makes me highly suspicious this may be the cause..
[14 Mar 2007 9:01] Guilhem Bichot
Arjen,
even this is weird:
# at 58853196
#070309 13:24:28 server id 893 end_log_pos 58853180 Query thread_id=186048
exec_time=0 error_code=0
SET TIMESTAMP=1173464668;
BEGIN;
end_log_pos is 58853180 which is SMALLER than the start position given in the #at line (58853196). End if before start...
I suggest looking at the raw binary log file with an hex editor, to first check if the event really starts at the position given by #at and really ends at the position given by end_log_pos.
[15 Mar 2007 11:14] Valeriy Kravchuk
I had created the following test case for MySQL's test suite:

openxs@suse:~/dbs/5.0/mysql-test> cat t/rpl_27037.test
source include/have_innodb.inc;
source include/master-slave.inc;
connection master;
show master status;
drop table if exists foo;
drop table if exists foo2;
create table foo(c1 int) engine=InnoDB;
create table foo2(c1 int);
BEGIN;
insert into foo values (1557);
insert into foo2 values (1557);
COMMIT;
BEGIN;
insert into foo values (1558);
insert into foo2 values (1558);
COMMIT;
BEGIN;
insert into foo values (1559);
insert into foo2 values (1559);
COMMIT;
show master status;
sync_slave_with_master;
connection slave;
lock tables foo read, foo2 read;
show master status;
show slave status;
unlock tables;

But while at least one of the tables (foo and foo2) used in transaction is REALLY created as InnoDB, I have COMMIT in both master's binary log and slave's relay log. And positions are reported correctly (on latest 5.0.38-BK):

openxs@suse:~/dbs/5.0/mysql-test> cat r/rpl_27037.log
stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;
show master status;
File    Position        Binlog_Do_DB    Binlog_Ignore_DB
master-bin.000001       98
drop table if exists foo;
Warnings:
Note    1051    Unknown table 'foo'
drop table if exists foo2;
Warnings:
Note    1051    Unknown table 'foo2'
create table foo(c1 int) engine=InnoDB;
create table foo2(c1 int);
BEGIN;
insert into foo values (1557);
insert into foo2 values (1557);
COMMIT;
BEGIN;
insert into foo values (1558);
insert into foo2 values (1558);
COMMIT;
BEGIN;
insert into foo values (1559);
insert into foo2 values (1559);
COMMIT;
show master status;
File    Position        Binlog_Do_DB    Binlog_Ignore_DB
master-bin.000001       1302
lock tables foo read, foo2 read;
show master status;
File    Position        Binlog_Do_DB    Binlog_Ignore_DB
slave-bin.000001        1127
show slave status;
Slave_IO_State  Master_Host     Master_User     Master_Port     Connect_RetryMas
ter_Log_File    Read_Master_Log_Pos     Relay_Log_File  Relay_Log_Pos   Relay_Ma
ster_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DBR
eplicate_Ignore_DB      Replicate_Do_Table      Replicate_Ignore_Table  Replicat
e_Wild_Do_Table Replicate_Wild_Ignore_Table     Last_Errno      Last_ErrorSkip_C
ounter  Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_FileUn
til_Log_Pos     Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Pa
th      Master_SSL_Cert Master_SSL_Cipher       Master_SSL_Key  Seconds_Behind_M
aster
Waiting for master to send event        127.0.0.1       root    9306    1master-
bin.000001      1302    slave-relay-bin.000003  1440    master-bin.000001YesYes0
                0       1302    1440    None            0       No      0
unlock tables;

openxs@suse:~/dbs/5.0/mysql-test> ../bin/mysqlbinlog var/log/slave-relay-bin.00
0003
...
# at 1160
#070315  5:18:29 server id 1  end_log_pos 1090  Query   thread_id=10    exec_tim
e=0     error_code=0
SET TIMESTAMP=1173928709/*!*/;
BEGIN/*!*/;
# at 1228
#070315  5:18:29 server id 1  end_log_pos 92    Query   thread_id=10    exec_tim
e=0     error_code=0
SET TIMESTAMP=1173928709/*!*/;
insert into foo values (1559)/*!*/;
# at 1320
#070315  5:18:29 server id 1  end_log_pos 185   Query   thread_id=10    exec_tim
e=0     error_code=0
SET TIMESTAMP=1173928709/*!*/;
insert into foo2 values (1559)/*!*/;
# at 1413
#070315  5:18:29 server id 1  end_log_pos 1302  Xid = 64
COMMIT/*!*/;
# at 1440
#070315  5:18:29 server id 2  end_log_pos 1459  Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

If both foo and foo2 are created as MyISAM for any reason, I do not see LAST COMMIT in master's binary log (all previous are there!). So, if engines are different on master and slave, we surely may have a problem.

Please, check my test case and reasoning.
[15 Mar 2007 14:32] Valeriy Kravchuk
While SHOW SLAVE STATUS shows correct position in my tests, SHOW MASTER STYATUS can really show incorrect position, just as described. Verified with latest 5.0.38-BK on Linux:

openxs@suse:~/dbs/5.0/mysql-test> ./mysql-test-run.pl rpl_27037
Logging: ./mysql-test-run.pl rpl_27037
MySQL Version 5.0.38
Using ndbcluster when necessary, mysqld supports it
Skipping SSL, mysqld not compiled with SSL
...

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

rpl_27037                      [ fail ]

Errors are (from /home/openxs/dbs/5.0/mysql-test/var/log/mysqltest-time) :
/home/openxs/dbs/5.0/bin/mysqltest: Can't get stat of '/home/openxs/dbs/5.0/mysq
l-test/r/rpl_27037.result' (Errcode: 2)
(the last lines may be the most important ones)
Result from queries before failure can be found in r/rpl_27037.log

Aborting: rpl_27037 failed in default mode. To continue, re-run with '--force'.
Stopping All Servers
openxs@suse:~/dbs/5.0/mysql-test> cat r/rpl_27037.log
stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;
drop table if exists foo;
Warnings:
Note    1051    Unknown table 'foo'
drop table if exists foo2;
Warnings:
Note    1051    Unknown table 'foo2'
create table foo(c1 int) engine=InnoDB;
create table foo2(c1 int) engine=InnoDB;
BEGIN;
insert into foo values (1557);
insert into foo2 values (1557);
COMMIT;
BEGIN;
insert into foo values (1558);
insert into foo2 values (1558);
COMMIT;
BEGIN;
insert into foo values (1559);
insert into foo2 values (1559);
COMMIT;
show slave status;
Slave_IO_State  Master_Host     Master_User     Master_Port     Connect_RetryMas
ter_Log_File    Read_Master_Log_Pos     Relay_Log_File  Relay_Log_Pos   Relay_Ma
ster_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DBR
eplicate_Ignore_DB      Replicate_Do_Table      Replicate_Ignore_Table  Replicat
e_Wild_Do_Table Replicate_Wild_Ignore_Table     Last_Errno      Last_ErrorSkip_C
ounter  Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_FileUn
til_Log_Pos     Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Pa
th      Master_SSL_Cert Master_SSL_Cipher       Master_SSL_Key  Seconds_Behind_M
aster
Queueing master event to the relay log  127.0.0.1       root    9306    1master-
bin.000001      4       slave-relay-bin.000002  4               Yes     Yes00028
9       None            0       No                                      0
stop slave;
start slave;
lock tables foo read, foo2 read;
show master status;
File    Position        Binlog_Do_DB    Binlog_Ignore_DB
slave-bin.000001        1141
show slave status;
Slave_IO_State  Master_Host     Master_User     Master_Port     Connect_RetryMas
ter_Log_File    Read_Master_Log_Pos     Relay_Log_File  Relay_Log_Pos   Relay_Ma
ster_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DBR
eplicate_Ignore_DB      Replicate_Do_Table      Replicate_Ignore_Table  Replicat
e_Wild_Do_Table Replicate_Wild_Ignore_Table     Last_Errno      Last_ErrorSkip_C
ounter  Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_FileUn
til_Log_Pos     Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Pa
th      Master_SSL_Cert Master_SSL_Cipher       Master_SSL_Key  Seconds_Behind_M
aster
Waiting for master to send event        127.0.0.1       root    9306    1master-
bin.000001      1316    slave-relay-bin.000004  1454    master-bin.000001YesYes0
                0       1316    1454    None            0       No      6
unlock tables;

Note the following results of SHOW MASTER STATUS on slave:

slave-bin.000001        1141

Now let's get content of binary log till that position:

openxs@suse:~/dbs/5.0/mysql-test> ../bin/mysqlbinlog var/log/master-bin.000001
 --stop-position=1141
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#070315  6:48:27 server id 1  end_log_pos 98    Start: binlog v 4, server v 5.0.
38-log created 070315  6:48:27 at startup
ROLLBACK/*!*/;
# at 98
#070315  6:48:27 server id 1  end_log_pos 185   Query   thread_id=2     exec_tim
e=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1173934107/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.uniq
ue_checks=1/*!*/;
SET @@session.sql_mode=0/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.
collation_server=8/*!*/;
drop table if exists foo/*!*/;
# at 185
#070315  6:48:27 server id 1  end_log_pos 273   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
drop table if exists foo2/*!*/;
# at 273
#070315  6:48:27 server id 1  end_log_pos 374   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
create table foo(c1 int) engine=InnoDB/*!*/;
# at 374
#070315  6:48:27 server id 1  end_log_pos 476   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
create table foo2(c1 int) engine=InnoDB/*!*/;
# at 476
#070315  6:48:27 server id 1  end_log_pos 544   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
BEGIN/*!*/;
# at 544
#070315  6:48:27 server id 1  end_log_pos 92    Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
insert into foo values (1557)/*!*/;
# at 636
#070315  6:48:27 server id 1  end_log_pos 185   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
insert into foo2 values (1557)/*!*/;
# at 729
#070315  6:48:27 server id 1  end_log_pos 756   Xid = 16
COMMIT/*!*/;
# at 756
#070315  6:48:27 server id 1  end_log_pos 824   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
BEGIN/*!*/;
# at 824
#070315  6:48:27 server id 1  end_log_pos 92    Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
insert into foo values (1558)/*!*/;
# at 916
#070315  6:48:27 server id 1  end_log_pos 185   Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
insert into foo2 values (1558)/*!*/;
# at 1009
#070315  6:48:27 server id 1  end_log_pos 1036  Xid = 20
COMMIT/*!*/;
# at 1036
#070315  6:48:27 server id 1  end_log_pos 1104  Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
BEGIN/*!*/;
# at 1104
#070315  6:48:27 server id 1  end_log_pos 92    Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1173934107/*!*/;
insert into foo values (1559)/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

So, no final commit, just as described initially (and no insert to foo2)!
It is not clear what is that 1141 position reported at all! This looks like a bug for me. I used the following test case:

openxs@suse:~/dbs/5.0/mysql-test> cat t/rpl_27037.test
source include/have_innodb.inc;
source include/master-slave.inc;
connection master;
drop table if exists foo;
drop table if exists foo2;
create table foo(c1 int) engine=InnoDB;
create table foo2(c1 int) engine=InnoDB;
BEGIN;
insert into foo values (1557);
insert into foo2 values (1557);
COMMIT;
BEGIN;
insert into foo values (1558);
insert into foo2 values (1558);
COMMIT;
BEGIN;
insert into foo values (1559);
insert into foo2 values (1559);
COMMIT;
save_master_pos;
connection slave;
show slave status;
stop slave;
sleep 5;
start slave;
sync_with_master;
lock tables foo read, foo2 read;
show master status;
show slave status;
unlock tables;
[15 Mar 2007 14:49] Valeriy Kravchuk
What I missed in that test case is the fact that last 

show master status;

was executed on slave and refered to slave's binary log:

start slave;
lock tables foo read, foo2 read;
show master status;
File    Position        Binlog_Do_DB    Binlog_Ignore_DB
slave-bin.000001        1141

But still:

openxs@suse:~/dbs/5.0/mysql-test> ../bin/mysqlbinlog var/log/slave-bin.000001
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#070315  6:57:30 server id 2  end_log_pos 98    Start: binlog v 4, server v 5.0.
38-log created 070315  6:57:30 at startup
ROLLBACK/*!*/;
...
# at 956
#070315  6:57:31 server id 1  end_log_pos 1048  Query   thread_id=2     exec_tim
e=5     error_code=0
SET TIMESTAMP=1173934651/*!*/;
insert into foo values (1559)/*!*/;
# at 1048
#070315  6:57:31 server id 1  end_log_pos 1141  Query   thread_id=2     exec_tim
e=5     error_code=0
SET TIMESTAMP=1173934651/*!*/;
insert into foo2 values (1559)/*!*/;
# at 1141
#070315  6:57:36 server id 2  end_log_pos 1160  Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

I do not see COMMIT in that position in slave's binary log.
[15 May 2007 5:30] Mats Kindahl
This bug was fixed with the patch for BUG#27171.