Bug #29558 RBR: CREATE SELECT has wrong binlog position
Submitted: 4 Jul 2007 20:24 Modified: 13 Jul 2007 16:33
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S2 (Serious)
Version:5.1-bk OS:Linux
Assigned to: Tatiana Azundris Nuernberg CPU Architecture:Any

[4 Jul 2007 20:24] Guilhem Bichot
Description:
I observed on my machine and pushbuild that rpl_sp_effects takes 10 secs in row-based replication and only 1 second in statement-based.

Here is why.
Master's binlog has this (./mtr --mem --ps-protocol
--mysqld=--binlog-format=row rpl_sp_effects)

# at 879
# at 918
#774 22:5:45 server id 1  end_log_pos 918       Table_map: `test`.`t1` mapped to
 number 18
#774 22:5:45 server id 1  end_log_pos 952       Write_rows: table id 18 flags: S
TMT_END_F

BINLOG '
mf2LRhMBAAAAJwAAAJYDAAAAABIAAAAAAAAABHRlc3QAAnQxAAED
mf2LRhcBAAAAIgAAALgDAAAQABIAAAAAAAEAAf/+BQAAAA==
'/*!*/;
# at 952
#774 22:5:45 server id 1  end_log_pos 1127      Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1183579545/*!*/;
CREATE DEFINER=`root`@`localhost` procedure p2()
begin
declare a int default 4;
create table t2 as select a;
end/*!*/;
# at 1127
#774 22:5:45 server id 1  end_log_pos 122       Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1183579545/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
CREATE TABLE `t2` (
  `a` bigint(11) DEFAULT NULL
)/*!*/;
# at 1249
# at 1288
#774 22:5:45 server id 1  end_log_pos 161       Table_map: `test`.`t2` mapped to
 number 19
#774 22:5:45 server id 1  end_log_pos 199       Write_rows: table id 19 flags: S
TMT_END_F

BINLOG '
mf2LRhMBAAAAJwAAAKEAAAAAABMAAAAAAAAABHRlc3QAAnQyAAEI
mf2LRhcBAAAAJgAAAMcAAAAQABMAAAAAAAEAAf/+BAAAAAAAAAA=
'/*!*/;
# at 1326
#774 22:5:55 server id 1  end_log_pos 1406      Query   thread_id=2     exec_tim
e=0     error_code=0
SET TIMESTAMP=1183579555/*!*/;
drop procedure p1/*!*/;

See the problem: end_log_pos goes from 1127 to 122, that cannot be
correct. 1127 is correct; 122 is wrong. This CREATE TABLE `t2` has a
wrong end_log_pos, probably because:
- it is from "create table t2 as select a" (execution of p2())
- in RBR, "create table t2 as select a" is binlogged as the CREATE
TABLE (from SHOW CREATE TABLE) and then the inserted rows; both events
(the CREATE and the binrows) probably first go to a cache and then
are flushed to binlog, a bit like when one commits a transaction; then it's
the known problem of BUG#22540: end_log_pos of events in caches is not
correct.

The consequence of these wrong 122, 161, 199 is:
slave executes the CREATE TABLE `t2` and the following binrows, then
considers it has executed up to position 199 (down from 1227!). SHOW
SLAVE STATUS tells that SQL thread is at 199 (wrong) and IO thread is
at 1326 (correct; how come it works: it's because slave IO thread
ignores log_pos, it just adds events' sizes as it receives events), then
in the test there is a sync_slave_with_master: as master's real
position (read by sync_slave_with_master using SHOW MASTER STATUS on
master) is 1326, a master_pos_wait(1326) is made, which has no chance
to succeed (as slave thinks it is as 199).
Then as this is a blocked situation, master is idle, and so after 10
seconds the master<->slave connection times out (slave_net_timeout),
slave reconnects asking "send me binlog from where IO thread stopped"
(1326) then a special ("fake") Rotate event (the infamous ones of
BUG#20435) is sent by master to say "this is a binlog dump from position 1326"; slave SQL thread sees this event and says "ah, so now I have reached 1326",
and so master_pos_wait() unblocks.

That's why the test finally passes after the timeout elapsed.

But the bug is really serious. 199 is a position which never existed
in the master's binlog (and even if it existed, it is not where the
SQL thread really is); if user stops the slave and does a CHANGE
MASTER to just change a parameter (like the password of the user to
connect with), replication will not restart (as it will ask master for
position 199), or will restart from where it should not be, executing statements which have already been, corrupting data...

How to repeat:
observe times of rpl_sp_effects in team tree 5.1-new-rpl

Suggested fix:
BUG#22540 will fix this, as it ensures that correct end_los_pos are written when flushing a cache to the binlog.
I am not marking this as duplicate of BUG#22540, because testcase is really different (no transactions here).
[4 Jul 2007 20:45] Guilhem Bichot
other tests which contain CREATE TABLE SELECT and run in 10.x secs in RBR for the same reason:
rpl_replicate_do
rpl_row_create_table
[13 Jul 2007 16:33] Tatiana Azundris Nuernberg
Fixed by patch for #22540, mysqlbinlog now renders corresponding end_log_pos/@:

# at 952
#7713 18:15:53 server id 1  end_log_pos 1127    Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1184343353/*!*/;
CREATE DEFINER=`root`@`localhost` procedure p2()
begin
declare a int default 4;
create table t2 as select a;
end/*!*/;
# at 1127
#7713 18:15:53 server id 1  end_log_pos 1249    Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1184343353/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
CREATE TABLE `t2` (
  `a` bigint(11) DEFAULT NULL
)/*!*/;
# at 1249
# at 1288
#7713 18:15:53 server id 1  end_log_pos 1288    Table_map: `test`.`t2` mapped to number 19
#7713 18:15:53 server id 1  end_log_pos 1326    Write_rows: table id 19 flags: STMT_END_F

BINLOG '
OaWXRhMBAAAAJwAAAAgFAAAAABMAAAAAAAAABHRlc3QAAnQyAAEI
OaWXRhcBAAAAJgAAAC4FAAAQABMAAAAAAAEAAf/+BAAAAAAAAAA=
'/*!*/;
# at 1326
#7713 18:15:53 server id 1  end_log_pos 1406    Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1184343353/*!*/;
drop procedure p1/*!*/;
# at 1406
#7713 18:15:53 server id 1  end_log_pos 1486    Query   thread_id=2     exec_time=0     error_code=0