Description:
I observed on my machine and pushbuild that rpl_sp_effects takes 10 secs in row-based replication and only 1 second in statement-based.
Here is why.
Master's binlog has this (./mtr --mem --ps-protocol
--mysqld=--binlog-format=row rpl_sp_effects)
# at 879
# at 918
#774 22:5:45 server id 1 end_log_pos 918 Table_map: `test`.`t1` mapped to
number 18
#774 22:5:45 server id 1 end_log_pos 952 Write_rows: table id 18 flags: S
TMT_END_F
BINLOG '
mf2LRhMBAAAAJwAAAJYDAAAAABIAAAAAAAAABHRlc3QAAnQxAAED
mf2LRhcBAAAAIgAAALgDAAAQABIAAAAAAAEAAf/+BQAAAA==
'/*!*/;
# at 952
#774 22:5:45 server id 1 end_log_pos 1127 Query thread_id=2 exec_tim
e=0 error_code=0
SET TIMESTAMP=1183579545/*!*/;
CREATE DEFINER=`root`@`localhost` procedure p2()
begin
declare a int default 4;
create table t2 as select a;
end/*!*/;
# at 1127
#774 22:5:45 server id 1 end_log_pos 122 Query thread_id=2 exec_tim
e=0 error_code=0
SET TIMESTAMP=1183579545/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
CREATE TABLE `t2` (
`a` bigint(11) DEFAULT NULL
)/*!*/;
# at 1249
# at 1288
#774 22:5:45 server id 1 end_log_pos 161 Table_map: `test`.`t2` mapped to
number 19
#774 22:5:45 server id 1 end_log_pos 199 Write_rows: table id 19 flags: S
TMT_END_F
BINLOG '
mf2LRhMBAAAAJwAAAKEAAAAAABMAAAAAAAAABHRlc3QAAnQyAAEI
mf2LRhcBAAAAJgAAAMcAAAAQABMAAAAAAAEAAf/+BAAAAAAAAAA=
'/*!*/;
# at 1326
#774 22:5:55 server id 1 end_log_pos 1406 Query thread_id=2 exec_tim
e=0 error_code=0
SET TIMESTAMP=1183579555/*!*/;
drop procedure p1/*!*/;
See the problem: end_log_pos goes from 1127 to 122, that cannot be
correct. 1127 is correct; 122 is wrong. This CREATE TABLE `t2` has a
wrong end_log_pos, probably because:
- it is from "create table t2 as select a" (execution of p2())
- in RBR, "create table t2 as select a" is binlogged as the CREATE
TABLE (from SHOW CREATE TABLE) and then the inserted rows; both events
(the CREATE and the binrows) probably first go to a cache and then
are flushed to binlog, a bit like when one commits a transaction; then it's
the known problem of BUG#22540: end_log_pos of events in caches is not
correct.
The consequence of these wrong 122, 161, 199 is:
slave executes the CREATE TABLE `t2` and the following binrows, then
considers it has executed up to position 199 (down from 1227!). SHOW
SLAVE STATUS tells that SQL thread is at 199 (wrong) and IO thread is
at 1326 (correct; how come it works: it's because slave IO thread
ignores log_pos, it just adds events' sizes as it receives events), then
in the test there is a sync_slave_with_master: as master's real
position (read by sync_slave_with_master using SHOW MASTER STATUS on
master) is 1326, a master_pos_wait(1326) is made, which has no chance
to succeed (as slave thinks it is as 199).
Then as this is a blocked situation, master is idle, and so after 10
seconds the master<->slave connection times out (slave_net_timeout),
slave reconnects asking "send me binlog from where IO thread stopped"
(1326) then a special ("fake") Rotate event (the infamous ones of
BUG#20435) is sent by master to say "this is a binlog dump from position 1326"; slave SQL thread sees this event and says "ah, so now I have reached 1326",
and so master_pos_wait() unblocks.
That's why the test finally passes after the timeout elapsed.
But the bug is really serious. 199 is a position which never existed
in the master's binlog (and even if it existed, it is not where the
SQL thread really is); if user stops the slave and does a CHANGE
MASTER to just change a parameter (like the password of the user to
connect with), replication will not restart (as it will ask master for
position 199), or will restart from where it should not be, executing statements which have already been, corrupting data...
How to repeat:
observe times of rpl_sp_effects in team tree 5.1-new-rpl
Suggested fix:
BUG#22540 will fix this, as it ensures that correct end_los_pos are written when flushing a cache to the binlog.
I am not marking this as duplicate of BUG#22540, because testcase is really different (no transactions here).