Bug #47210 | first execution of "start slave until" stops too early | ||
---|---|---|---|
Submitted: | 9 Sep 2009 8:36 | Modified: | 12 Mar 2010 17:31 |
Reporter: | Andrii Nikitin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.0.60+, 5.1.37 | OS: | Any |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
[9 Sep 2009 8:36]
Andrii Nikitin
[9 Sep 2009 8:47]
Andrii Nikitin
WARNING this script will clean up all sandbox servers
Attachment: repltest.sh (application/x-shellscript, text), 6.15 KiB.
[9 Sep 2009 8:47]
Andrii Nikitin
WARNING this script will clean up all sandbox servers
Attachment: repl47210.sh (application/x-shellscript, text), 468 bytes.
[9 Sep 2009 9:52]
Andrii Nikitin
log demosntrating the problem
Attachment: repl47210.log (text/x-log), 3.26 KiB.
[16 Oct 2009 12:57]
Andrei Elkin
Andrii, hello. It seems i rashed yesterday with the positive verification on my site. Actually today executing logics of your scripts for few times has not showed the reported symptom. UNTIL has been always == EXEC position as it's specified. I wonder if the problem indeed is either your report time version was vulnerable or the script's logics relies on a sleep time and 5 secs somehow was not enough to get UNTIL == EXEC in sync. I am asking you to re-verify it all increasing sleep time to a sufficiently high value. Cheers, The namesake :-)
[16 Oct 2009 15:59]
Andrii Nikitin
Andrei, thanks into looking into it. I've tested with 5.0.86->5.0.86 with 25 sec sleep and much higher offset for first iteration: [*fail*] slave 5_0_86 has bug: Until_Log_Pos (179293) must be equal Exec_Master_Log_Pos (179207). Also tested 5.1.39 (the same problem) and 5.0.58 (no bug).
[2 Nov 2009 8:58]
Andrii Nikitin
Hi Andrei, I managed to find shorter scenario to repeat the problem. (tested on mysql-5.0-bugteam from 30-Oct). Please check if the steps give same results for you. S: change master to MASTER_HOST='127.0.0.1', MASTER_USER='root', MASTER_PASSWORD=''; start slave; M: create database d; use d; CREATE TABLE t(id int) engine=myisam; show master status; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | quad-bin.000001 | 271 | | | +-----------------+----------+--------------+------------------+ S: stop slave; show slave status\G Master_Log_File: quad-bin.000001 Read_Master_Log_Pos: 271 Relay_Log_File: quad-relay-bin.000002 Relay_Log_Pos: 407 Relay_Master_Log_File: quad-bin.000001 Slave_IO_Running: No Slave_SQL_Running: No Exec_Master_Log_Pos: 271 Relay_Log_Space: 407 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 M: insert into t values(1); show master status; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | quad-bin.000001 | 354 | | | +-----------------+----------+--------------+------------------+ insert into t values(1); insert into t values(1); insert into t values(1); show master status; +-----------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-----------------+----------+--------------+------------------+ | quad-bin.000001 | 603 | | | +-----------------+----------+--------------+------------------+ S: start slave until master_log_file='quad-bin.000001', master_log_pos=354; select sleep(5); show slave status\G Master_Log_File: quad-bin.000001 Read_Master_Log_Pos: 603 Relay_Log_File: quad-relay-bin.000002 Relay_Log_Pos: 407 Relay_Master_Log_File: quad-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: No Exec_Master_Log_Pos: 271 <<--- (!!!) Relay_Log_Space: 1021 | Until_Condition: Master | Until_Log_File: quad-bin.000001 | Until_Log_Pos: 354 <<----- (!!!) Exec_Master_Log_Pos expected to be increased to 354 . In later calls START UNTIL worked properly, so, the problem looks occurring only with first START UNTIL if master_log_pos is little higher than Exec_Master_Log_Pos Please let me know if something works differently on your side or this is expected behavior.
[11 Nov 2009 8:44]
Andrei Elkin
Andrii, thank you again for help to verify. My earlier inability to make it seems to be caused by a default way to small slave_net_timeout so that relay log could grow a few events more being rotated away. I reproduced it and now it's clear that there is a mixup in comparing the UNTIL (master) and the log position of a local (slave instantiated) event. To be continued with technical fixing very soon. Thanks! The namesake :-)
[12 Nov 2009 15:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/90253 2820 Andrei Elkin 2009-11-12 Bug #47210 first execution of "start slave until" stops too early Until-pos guarding did not distiguish the master originated events from ones that the slave can introduce to the relay log e.g Rotate to the next relay log at slave restarting. The local Rotate's coordinate are incomparable with the Until-master-pos. That led to the unexpectable stop this bug describes. Fixed with to avoid Until-master-pos comparison for a local slave's event. Notice that if --replicate-same-server is true such event is treated as coming from the master side. @ mysql-test/r/rpl_until.result results changed. @ mysql-test/t/rpl_until.test regression test for bug#47210 is added. @ sql/slave.cc st_relay_log_info::is_until_satisfied() is augmented with avoidance of Until-master-pos comparison for a local slave's event. if --replicate-same-server is true such event is treated as coming from the master side. @ sql/slave.h signature of is_until_satisfied() changed to supply THD and Event to the routine.
[14 Dec 2009 16:58]
Andrei Elkin
Pushed to 5.0-bt -> 5.1-bt -> mysql-pe.
[19 Dec 2009 8:28]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091219082307-f3i4fn0tm8trb3c0) (version source revid:alik@sun.com-20091216180721-eoa754i79j4ssd3m) (merge vers: 6.0.14-alpha) (pib:15)
[19 Dec 2009 8:32]
Bugs System
Pushed into 5.5.1-m2 (revid:alik@sun.com-20091219082021-f34nq4jytwamozz0) (version source revid:alexey.kopytov@sun.com-20091216134707-o96eqw0u2ynvo9gm) (merge vers: 5.5.0-beta) (pib:15)
[19 Dec 2009 8:36]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20091219082213-nhjjgmphote4ntxj) (version source revid:alik@sun.com-20091216180221-a5ps59gajad3pip9) (pib:15)
[11 Jan 2010 17:50]
Paul DuBois
Noted in 5.5.1, 6.0.14 changelogs. The first execution of STOP SLAVE UNTIL stopped too early. Setting report to NDI pending push to 5.1.x, Celosia.
[15 Jan 2010 9:00]
Bugs System
Pushed into 5.1.43 (revid:joro@sun.com-20100115085139-qkh0i0fpohd9u9p5) (version source revid:alexey.kopytov@sun.com-20091214172941-yqe4f478evi8347x) (merge vers: 5.1.42) (pib:16)
[15 Jan 2010 18:34]
Paul DuBois
Noted in 5.1.43 changelog. Setting report to NDI pending push to Celosia.
[12 Mar 2010 14:20]
Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:34]
Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:50]
Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[12 Mar 2010 17:31]
Paul DuBois
Fixed in earlier 5.1.x, 5.5.x.