Bug #47210 first execution of "start slave until" stops too early
Submitted: 9 Sep 2009 8:36 Modified: 12 Mar 2010 17:31
Reporter: Andrii Nikitin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.60+, 5.1.37 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any

[9 Sep 2009 8:36] Andrii Nikitin
Description:
In 5.0.60+ replication START SLAVE UNTIL stops too early when issued for first time.

(also see related bug #47142)

How to repeat:
1. install 

make_replication_sandbox mysql-enterprise-gpl-5.0.60-linux-x86_64-glibc23.tar.gz
make_replication_sandbox mysql-enterprise-gpl-5.0.85-linux-x86_64-glibc23.tar.gz
make_replication_sandbox mysql-advanced-gpl-5.1.37-linux-x86_64-glibc23.tar.gz

2. Put attached scripts into sandbox directory and run
./repl27210.sh | grep "\["

3. See output like below (also attached log)
[info] starting tests master=5_0_60 slaves=5_0_60 5_0_70 5_0_85 5_1_37    
[info] iteration 1 with 20 inserts
[info] start slaves UNTIL "mysql-bin.000001", "2034"
[*fail*] slave 5_0_60: Until_Log_Pos (2034) must be equal Exec_Master_Log_Pos (1948)
[*fail*] slave 5_0_70: Until_Log_Pos (2034) must be equal Exec_Master_Log_Pos (1948)
[*fail*] slave 5_0_85: Until_Log_Pos (2034) must be equal Exec_Master_Log_Pos (1948)
[*fail*] slave 5_1_37: Until_Log_Pos (2034) must be equal Exec_Master_Log_Pos (1948)
[info] iteration 2 with 100 inserts
[info] start slaves UNTIL "mysql-bin.000001", "12692"
[pass] slave 5_0_60 shows no problems. 
[pass] slave 5_0_70 shows no problems. 
[pass] slave 5_0_85 shows no problems. 
[pass] slave 5_1_37 shows no problems. 
[info] iteration 3 with 1000 inserts
[info] start slaves UNTIL "mysql-bin.000001", "110810"
[pass] slave 5_0_60 shows no problems. 
[pass] slave 5_0_70 shows no problems. 
[pass] slave 5_0_85 shows no problems. 
[pass] slave 5_1_37 shows no problems. 

Suggested fix:
Until_Log_Pos must be equal to Exec_Master_Log_Pos when SQL thread stops after  START SLAVE UNTIL executed
[9 Sep 2009 8:47] Andrii Nikitin
WARNING this script will clean up all sandbox servers

Attachment: repltest.sh (application/x-shellscript, text), 6.15 KiB.

[9 Sep 2009 8:47] Andrii Nikitin
WARNING this script will clean up all sandbox servers

Attachment: repl47210.sh (application/x-shellscript, text), 468 bytes.

[9 Sep 2009 9:52] Andrii Nikitin
log demosntrating the problem

Attachment: repl47210.log (text/x-log), 3.26 KiB.

[16 Oct 2009 12:57] Andrei Elkin
Andrii, hello.

It seems i rashed yesterday with the positive verification on my
site. Actually today executing logics of your scripts for few times
has not showed the reported symptom. UNTIL has been always == EXEC
position as it's specified.

I wonder if the problem indeed is either your report time version was
vulnerable or the script's logics relies on a sleep time and 5 secs
somehow was not enough to get UNTIL == EXEC in sync.

I am asking you to re-verify it all increasing sleep time to a
sufficiently high value.

Cheers,

The namesake :-)
[16 Oct 2009 15:59] Andrii Nikitin
Andrei,
thanks into looking into it. 

I've tested with 5.0.86->5.0.86 with 25 sec sleep and much higher offset for first iteration:
[*fail*] slave 5_0_86 has bug: Until_Log_Pos (179293) must be equal Exec_Master_Log_Pos (179207).

Also tested 5.1.39 (the same problem) and 5.0.58 (no bug).
[2 Nov 2009 8:58] Andrii Nikitin
Hi Andrei,

I managed to find shorter scenario to repeat the problem. (tested on mysql-5.0-bugteam from 30-Oct).

Please check if the steps give same results for you.

S:
change master to MASTER_HOST='127.0.0.1', MASTER_USER='root', MASTER_PASSWORD='';
start slave;

M:
create database d;
use d;
CREATE TABLE t(id int) engine=myisam;
show master status;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| quad-bin.000001 |      271 |              |                  |
+-----------------+----------+--------------+------------------+

S:
stop slave;
show slave status\G
            Master_Log_File: quad-bin.000001
        Read_Master_Log_Pos: 271
             Relay_Log_File: quad-relay-bin.000002
              Relay_Log_Pos: 407
      Relay_Master_Log_File: quad-bin.000001
           Slave_IO_Running: No
          Slave_SQL_Running: No
        Exec_Master_Log_Pos: 271
            Relay_Log_Space: 407
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0

M:
insert into t values(1);
show master status;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| quad-bin.000001 |      354 |              |                  |
+-----------------+----------+--------------+------------------+

insert into t values(1);
insert into t values(1);
insert into t values(1);

show master status;
+-----------------+----------+--------------+------------------+
| File            | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-----------------+----------+--------------+------------------+
| quad-bin.000001 |      603 |              |                  |
+-----------------+----------+--------------+------------------+

S:
start slave until master_log_file='quad-bin.000001', master_log_pos=354;
select sleep(5);
show slave status\G
            Master_Log_File: quad-bin.000001
        Read_Master_Log_Pos: 603
             Relay_Log_File: quad-relay-bin.000002
              Relay_Log_Pos: 407
      Relay_Master_Log_File: quad-bin.000001
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
        Exec_Master_Log_Pos: 271    <<--- (!!!)
            Relay_Log_Space: 1021            |
            Until_Condition: Master          |
             Until_Log_File: quad-bin.000001 |
              Until_Log_Pos: 354       <<-----

(!!!) Exec_Master_Log_Pos expected to be increased to 354 . In later calls START UNTIL worked properly, 
so, the problem looks occurring only with first START UNTIL if master_log_pos is little higher than  Exec_Master_Log_Pos

Please let me know if something works differently on your side or this is expected behavior.
[11 Nov 2009 8:44] Andrei Elkin
Andrii, thank you again for help to verify. My earlier inability to
make it seems to be caused by a default way to small
slave_net_timeout so that relay log could grow a few events more being
rotated away.

I reproduced it and now it's clear that there is a mixup in comparing
the UNTIL (master) and the log position of a local (slave instantiated) event.

To be continued with technical fixing very soon.

Thanks!

The namesake :-)
[12 Nov 2009 15:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90253

2820 Andrei Elkin	2009-11-12
      Bug #47210   first execution of "start slave until" stops too early  
      
      Until-pos guarding did not distiguish the master originated events from ones that the slave 
      can introduce to the relay log e.g Rotate to the next relay log at slave restarting.
      The local Rotate's coordinate are incomparable with the Until-master-pos.
      That led to the unexpectable stop this bug describes.
      
      Fixed with to avoid Until-master-pos comparison for a local slave's event.
      Notice that if --replicate-same-server is true such event is treated as coming from
      the master side.
     @ mysql-test/r/rpl_until.result
        results changed.
     @ mysql-test/t/rpl_until.test
        regression test for bug#47210 is added.
     @ sql/slave.cc
        st_relay_log_info::is_until_satisfied() is augmented with avoidance of 
        Until-master-pos comparison for a local slave's event.
        if --replicate-same-server is true such event is treated as coming from
        the master side.
     @ sql/slave.h
        signature of is_until_satisfied() changed to supply THD and Event to the routine.
[14 Dec 2009 16:58] Andrei Elkin
Pushed to 5.0-bt -> 5.1-bt -> mysql-pe.
[19 Dec 2009 8:28] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091219082307-f3i4fn0tm8trb3c0) (version source revid:alik@sun.com-20091216180721-eoa754i79j4ssd3m) (merge vers: 6.0.14-alpha) (pib:15)
[19 Dec 2009 8:32] Bugs System
Pushed into 5.5.1-m2 (revid:alik@sun.com-20091219082021-f34nq4jytwamozz0) (version source revid:alexey.kopytov@sun.com-20091216134707-o96eqw0u2ynvo9gm) (merge vers: 5.5.0-beta) (pib:15)
[19 Dec 2009 8:36] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20091219082213-nhjjgmphote4ntxj) (version source revid:alik@sun.com-20091216180221-a5ps59gajad3pip9) (pib:15)
[11 Jan 2010 17:50] Paul DuBois
Noted in 5.5.1, 6.0.14 changelogs.

The first execution of STOP SLAVE UNTIL stopped too early.

Setting report to NDI pending push to 5.1.x, Celosia.
[15 Jan 2010 9:00] Bugs System
Pushed into 5.1.43 (revid:joro@sun.com-20100115085139-qkh0i0fpohd9u9p5) (version source revid:alexey.kopytov@sun.com-20091214172941-yqe4f478evi8347x) (merge vers: 5.1.42) (pib:16)
[15 Jan 2010 18:34] Paul DuBois
Noted in 5.1.43 changelog.

Setting report to NDI pending push to Celosia.
[12 Mar 2010 14:20] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:34] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:50] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[12 Mar 2010 17:31] Paul DuBois
Fixed in earlier 5.1.x, 5.5.x.