Bug #47142 "slave start until" stops 1 event too late in 4.1 to 5.0 replication
Submitted: 4 Sep 2009 16:07 Modified: 15 Mar 2010 4:47
Reporter: Andrii Nikitin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.60+ OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any
Triage: Triaged: D3 (Medium)

[4 Sep 2009 16:07] Andrii Nikitin
Description:
When replicating 4.1 to 5.0.60 or later START SLAVE UNTIL stops too late.

Looks regression of bug #13861 because is not reproducible in 5.0.58.

How to repeat:
1. Set up 4.1 -> 5.0.60+ replication
2. stop slave
3. do some updates to master
4. remember 'File' and 'Position' from 'SHOW MASTER STATUS' output
5. do some updates to master
6. execute with values from step #4:
START SLAVE UNTIL  master_log_file='mysql-bin.000003', master_log_pos=437;
7. Note that "Exec_Master_Log_Pos" value in SHOW SLAVE STATUS output is greater than position at step 6.

Note that 5.0->5.0 and 4.1->5.0.58 works properly ("Exec_Master_Log_Pos" is equal to "Until_Log_Pos")

Suggested fix:
do not over-process master binary log
[4 Sep 2009 16:09] Andrii Nikitin
listing demosntrates problem

Attachment: listing4125to5060.txt (text/plain), 5.25 KiB.

[9 Sep 2009 10:53] Andrii Nikitin
When testing this problem I realized that in replication from any 5.0 master first execution of START SLAVE UNTIL stops too early. 
I opened separate bugreport bug #47210 because different versions of MASTER are involved into  bug and it is harder to reproduce.

One can use automated scripts from bug #47210 to demonstrate problem in this bug.

So, 
4.1 -> 5.0.60+ bug #47142  (replication to 5.1 is not officially supported here)
5.0 -> (5.0.60+ or 5.1) bug #47210 (unstable, only first run of START SLAVE UNTIL)

See repltest.sh from bug #47210 and attached here repl47142.log and repl47142.sh
[9 Sep 2009 11:04] Andrii Nikitin
script to run with repltest.sh from bug #47210

Attachment: repl47142.sh (application/x-shellscript, text), 456 bytes.

[9 Sep 2009 11:10] Andrii Nikitin
log demosntrating the problem

Attachment: repl47142.log (text/x-log), 3.34 KiB.

[13 Nov 2009 19:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90394

2845 Andrei Elkin	2009-11-13
      Bug #47142 "slave start until" stops 1 event too late in 4.1 to 5.0 replication
      
      When replicating from 4.1 master to 5.0 slave START SLAVE UNTIL can stop too late.
      The necessary in calculating of the beginning of an event the event's length
      did not correspond to the master's genuine information at the event's execution time.
      That piece of info was changed at the event's relay-logging.
      
      Fixed with storing the master genuine Query_log_event size into a new status
      variable at relay-logging of the event. The stored info is extacted at the event
      execution and participate further in the until-pos stopping routine.
      The new status variable's algorithm will be only active when the event comes
      from the master of version less than 5.0.
       
     @ sql/log_event.cc
        Storing and extracting the master's genuine size of the event from the status
        var of the event packet header.
     @ sql/log_event.h
        Incrementing the max szie of MAX_SIZE_LOG_EVENT_STATUS because of the new status var;
        Defining the new status variable to hold the master's genuine event size;
        Augmending the Query_log_event with a new member to hold a value to store/extact from the status
        var of the event packet header.
[22 Jan 2010 19:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97925

2845 Andrei Elkin	2010-01-22
      Bug #47142 "slave start until" stops 1 event too late in 4.1 to 5.0 replication
      
      When replicating from 4.1 master to 5.0 slave START SLAVE UNTIL can stop too late.
      The necessary in calculating of the beginning of an event the event's length
      did not correspond to the master's genuine information at the event's execution time.
      That piece of info was changed at the event's relay-logging due to binlog_version<4 event
      conversion by IO thread.
      
      Fixed with storing the master genuine Query_log_event size into a new status
      variable at relay-logging of the event. The stored info is extacted at the event
      execution and participate further to caclulate the correct start position of the event
      in the until-pos stopping routine.
      The new status variable's algorithm will be only active when the event comes
      from the master of version < 5.0 (binlog_version < 4).
     @ mysql-test/r/rpl_until.result
        results changed.
     @ mysql-test/std_data/bug47142_master-bin.000001
        a binlog from 4.1 master to replace one of the running 5.x master is added as 
        part of Bug #47142 regression test.
     @ mysql-test/t/rpl_until.test
        Regression test for Bug #47142 is added.
     @ sql/log_event.cc
        Storing and extracting the master's genuine size of the event from the status
        var of the event packet header.
        The binlog_version<4 query-log-event is 
        a. converted into the modern binlog_version==4 to store the original size of the event
           into a new status var; the converted representation goes into the relay log.
        b. the converted event is read out and the stored size is engaged in the start pos calculation.
        
        The new status is active only for events that IO thread instantiates for the sake of the conversion.
     @ sql/log_event.h
        Incrementing the max szie of MAX_SIZE_LOG_EVENT_STATUS because of the new status var;
        Defining the new status variable to hold the master's genuine event size;
        Augmenting the Query_log_event with a new member to hold a value to store/extact from the status
        var of the event packet header.
[25 Jan 2010 15:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/98069

2845 Andrei Elkin	2010-01-25
      Bug #47142 "slave start until" stops 1 event too late in 4.1 to 5.0 replication
      
      When replicating from 4.1 master to 5.0 slave START SLAVE UNTIL can stop too late.
      The necessary in calculating of the beginning of an event the event's length
      did not correspond to the master's genuine information at the event's execution time.
      That piece of info was changed at the event's relay-logging due to binlog_version<4 event
      conversion by IO thread.
      
      Fixed with storing the master genuine Query_log_event size into a new status
      variable at relay-logging of the event. The stored info is extacted at the event
      execution and participate further to caclulate the correct start position of the event
      in the until-pos stopping routine.
      The new status variable's algorithm will be only active when the event comes
      from the master of version < 5.0 (binlog_version < 4).
     @ mysql-test/r/rpl_until.result
        results changed.
     @ mysql-test/std_data/bug47142_master-bin.000001
        a binlog from 4.1 master to replace one of the running 5.x master is added as 
        part of Bug #47142 regression test.
     @ mysql-test/t/rpl_until.test
        Regression test for Bug #47142 is added.
     @ sql/log_event.cc
        Storing and extracting the master's genuine size of the event from the status
        var of the event packet header.
        The binlog_version<4 query-log-event is 
        a. converted into the modern binlog_version==4 to store the original size of the event
           into a new status var; the converted representation goes into the relay log.
        b. the converted event is read out and the stored size is engaged in the start pos calculation.
        
        The new status is active only for events that IO thread instantiates for the sake of the conversion.
     @ sql/log_event.h
        Incrementing the max szie of MAX_SIZE_LOG_EVENT_STATUS because of the new status var;
        Defining the new status variable to hold the master's genuine event size;
        Augmenting the Query_log_event with a new member to hold a value to store/extact from the status
        var of the event packet header.
[27 Jan 2010 13:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/98332

2846 Andrei Elkin	2010-01-27
      bug#47142
      improving comments
[27 Jan 2010 17:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/98377

3335 Andrei Elkin	2010-01-27 [merge]
      bug#47142
      merging patches prepared for 5.0 to 5.1-bt. That caused a few changes in the test file
[28 Jan 2010 10:24] Andrei Elkin
Pushed to 5.1-bt, pe trees.
[4 Feb 2010 10:18] Bugs System
Pushed into 5.1.44 (revid:joro@sun.com-20100204101444-2j32mhqroo0iiio6) (version source revid:aelkin@mysql.com-20100127172749-82epzpma1pws2xnz) (merge vers: 5.1.43) (pib:16)
[5 Feb 2010 11:46] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100130220650-3md8m8w9frcwjj9f) (pib:16)
[5 Feb 2010 11:53] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)
[5 Feb 2010 11:58] Bugs System
Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alik@sun.com-20100130191336-i53i9wx67n81ridm) (merge vers: 5.5.2-m2) (pib:16)
[10 Feb 2010 4:28] Jon Stephens
Documented in the 5.1.44, 5.5.2, and 6.0.14 changelogs as follows:

      Due to a change in the size of event representations in the binary 
      log, when replicating from a MySQL 4.1 master to a slave running MySQL 
      5.0.60 or later, the START SLAVE UNTIL statement did not function 
      correctly, stopping at the wrong position in the log. Now the slave 
      detects that the master is using the older version of the binary log 
      format, and corrects for the difference in event size, so that the 
      slave stops in the correct position.

Closed. (Target = 5.1+.)
[12 Mar 2010 14:11] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:27] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:41] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[15 Mar 2010 4:47] Jon Stephens
No new changelog entries required. Closed.