Bug #8693 Test 'rpl_log_pos' fails sometimes
Submitted: 22 Feb 2005 16:25 Modified: 25 Feb 2010 10:33
Reporter: Joerg Bruehe Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Tests: Replication Severity:S7 (Test Cases)
Version:5.0.3 OS:IBM AIX (AIX 5.2, QNX, Mac OS X)
Assigned to: Serge Kozlov CPU Architecture:Any

[22 Feb 2005 16:25] Joerg Bruehe
Description:
Build based on changeset
  1.1880 05/02/21 20:41:48 monty@mysql.com +8 -0
  Generate warning in ha_delete_table() if files is missing in handler
fails in 'classic' (others not yet done, 'debug' makes compiler crash). Result output:

-------------------------------------------------------
*** r/rpl_log_pos.result        Tue Feb 22 01:56:23 2005
--- r/rpl_log_pos.reject        Tue Feb 22 13:34:17 2005
***************
*** 21,33 ****
  start slave;
  show slave status;
  Slave_IO_State        Master_Host     Master_User     Master_Port     Connect_Retry   Master_Log_File Read_Master_Log_Pos
    Relay_Log_File  Relay_Log_Pos   Relay_Master_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DB
Replicate_Ignore_DB     Replicate_Do_Table      Replicate_Ignore_Table  Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table
    Last_Errno      Last_Error      Skip_Counter    Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_File
Until_Log_Pos   Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Path      Master_SSL_Cert Master_SSL_Cipher
    Master_SSL_Key  Seconds_Behind_Master
! #     127.0.0.1       root    MASTER_PORT     1       master-bin.000001       73      #       #       master-bin.000001
    No      Yes                                                     0               0       73      #       None
0       No                                              #
  stop slave;
  change master to master_log_pos=173;
  start slave;
  show slave status;
  Slave_IO_State        Master_Host     Master_User     Master_Port     Connect_Retry   Master_Log_File Read_Master_Log_Pos
    Relay_Log_File  Relay_Log_Pos   Relay_Master_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DB
Replicate_Ignore_DB     Replicate_Do_Table      Replicate_Ignore_Table  Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table
    Last_Errno      Last_Error      Skip_Counter    Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_File
Until_Log_Pos   Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Path      Master_SSL_Cert Master_SSL_Cipher
    Master_SSL_Key  Seconds_Behind_Master
! #     127.0.0.1       root    MASTER_PORT     1       master-bin.000001       173     #       #       master-bin.000001
    No      Yes                                                     0               0       173     #       None
0       No                                              #
  show master status;
  File  Position        Binlog_Do_DB    Binlog_Ignore_DB
  master-bin.000001     96
--- 21,33 ----
  start slave;
  show slave status;
  Slave_IO_State        Master_Host     Master_User     Master_Port     Connect_Retry   Master_Log_File Read_Master_Log_Pos
    Relay_Log_File  Relay_Log_Pos   Relay_Master_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DB
Replicate_Ignore_DB     Replicate_Do_Table      Replicate_Ignore_Table  Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table
    Last_Errno      Last_Error      Skip_Counter    Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_File
Until_Log_Pos   Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Path      Master_SSL_Cert Master_SSL_Cipher
    Master_SSL_Key  Seconds_Behind_Master
! #     127.0.0.1       root    MASTER_PORT     1       master-bin.000001       73      #       #       master-bin.000001
    No      Yes                                                     0       Rolling back unfinished transaction (no COMMIT o
r ROLLBACK) from relay log. A probable cause is that the master died while writing the transaction to its binary log.   0
    73      #       None            0       No                                              #
  stop slave;
  change master to master_log_pos=173;
  start slave;
  show slave status;
  Slave_IO_State        Master_Host     Master_User     Master_Port     Connect_Retry   Master_Log_File Read_Master_Log_Pos
    Relay_Log_File  Relay_Log_Pos   Relay_Master_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DB
Replicate_Ignore_DB     Replicate_Do_Table      Replicate_Ignore_Table  Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table
    Last_Errno      Last_Error      Skip_Counter    Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_File
Until_Log_Pos   Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Path      Master_SSL_Cert Master_SSL_Cipher
    Master_SSL_Key  Seconds_Behind_Master
! #     127.0.0.1       root    MASTER_PORT     1       master-bin.000001       173     #       #       master-bin.000001
    No      Yes                                                     0       Rolling back unfinished transaction (no COMMIT o
r ROLLBACK) from relay log. A probable cause is that the master died while writing the transaction to its binary log.   0
    173     #       None            0       No                                              #
  show master status;
  File  Position        Binlog_Do_DB    Binlog_Ignore_DB
  master-bin.000001     96
-------------------------------------------------------

How to repeat:
Build + test there.

I saved the binary tree on that machine 'aix52' in a tree '~mysqldev/KEEP_503_20050222_classic'. Please delete it when it is not needed any more - disk space is scarce everywhere.
[23 Feb 2005 15:58] Lars Thalmann
After analyzing this bug, the followings has been found out:

The bug only shows up if one runs 
  ./mysql-test-run rpl_log rpl_log_pos

Running the following, shows no failure:
  ./mysql-test-run rpl_log_pos

The problems shows up in the following code:

int Format_description_log_event::exec_event(struct st_relay_log_info* rli)
{
  DBUG_ENTER("Format_description_log_event::exec_event");

  /* save the information describing this binlog */
  delete rli->relay_log.description_event_for_exec;
  rli->relay_log.description_event_for_exec= this;

  /*
    As a transaction NEVER spans on 2 or more binlogs:
    if we have an active transaction at this point, the master died
    while writing the transaction to the binary log, i.e. while
    flushing the binlog cache to the binlog. As the write was started,
    the transaction had been committed on the master, so we lack of
    information to replay this transaction on the slave; all we can do
    is stop with error.
    Note: this event could be sent by the master to inform us of the
    format of its binlog; in other words maybe it is not at its
    original place when it comes to us; we'll know this by checking
    log_pos ("artificial" events have log_pos == 0).
  */
  if (!artificial_event && created && thd->transaction.all.nht)
  {
    slave_print_error(rli, 0, "Rolling back unfinished transaction (no "
                      "COMMIT or ROLLBACK) from relay log. A probable cause "
                      "is that the master died while writing the transaction "
                      "to its binary log.");
    end_trans(thd, ROLLBACK);
  }

When this is executed (and there is a failure), we have that
  thd->options & OPTION_BEGIN = 0
  thd->options & OPTION_NOT_AUTOCOMMIT = 0

The first of these tests (OPTION_BEGIN) was removed from the code 
earlier and adding it back removes the failure.  According to Serg 
this is not enough however, and one needs to study this in more detail.
[28 Feb 2005 11:18] Lars Thalmann
Many changes has been done to the 5.0 tree.
Have discussed with Serg and Brian.  Since this 
failure did not repeat itself in any of the later tests, 
it is being closed.
[24 Mar 2005 17:44] Joerg Bruehe
This symptom re-occurs in the final release build of 5.0.3 on some machines:
- AIX 4.3 "classic" in both default + PS-protocol tests,
- AIX 5.2 (64 bit) "classic" just in default test (first difference only, lines 21 - 27),
- QNX only in PS-protocol test, and also earlier in test case: Lines 9 - 15 (diff added below),
- Max OS X 10.2 (not 10.3 !) "classic" just in default test, and only for lines 9 - 15.

Difference from QNX:
-------------------------------------------------------
*** r/rpl_log_pos.result        Wed Mar 23 02:46:44 2005
--- r/rpl_log_pos.reject        Thu Mar 24 08:19:01 2005
***************
*** 9,15 ****
  master-bin.000001     98
  show slave status;
  Slave_IO_State        Master_Host     Master_User     Master_Port     Connect_Retry   Master_Log_File Read_Master_Log_Pos     Relay_Log_File  Relay_Log_Pos   Relay_Master_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DB Replicate_Ignore_DB     Replicate_Do_Table      Replicate_Ignore_Table  Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table     Last_Errno      Last_Error      Skip_Counter    Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_File  Until_Log_Pos   Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Path      Master_SSL_Cert Master_SSL_Cipher       Master_SSL_Key  Seconds_Behind_Master
! #     127.0.0.1       root    MASTER_PORT     1       master-bin.000001       98      #       #       master-bin.000001       Yes     Yes                                                     0               0       98      #       None            0       No                                              #
  stop slave;
  change master to master_log_pos=73;
  start slave;
--- 9,15 ----
  master-bin.000001     98
  show slave status;
  Slave_IO_State        Master_Host     Master_User     Master_Port     Connect_Retry   Master_Log_File Read_Master_Log_Pos     Relay_Log_File  Relay_Log_Pos   Relay_Master_Log_File   Slave_IO_Running        Slave_SQL_Running       Replicate_Do_DB Replicate_Ignore_DB     Replicate_Do_Table      Replicate_Ignore_Table  Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table     Last_Errno      Last_Error      Skip_Counter    Exec_Master_Log_Pos     Relay_Log_Space Until_Condition Until_Log_File  Until_Log_Pos   Master_SSL_Allowed      Master_SSL_CA_File      Master_SSL_CA_Path      Master_SSL_Cert Master_SSL_Cipher       Master_SSL_Key  Seconds_Behind_Master
! #     127.0.0.1       root    MASTER_PORT     1       master-bin.000001       98      #       #       master-bin.000001       Yes     Yes                                                     0       Rolling back unfinished transaction (no COMMIT or ROLLBACK) from relay log. A probable cause is that the master died while writing the transaction to its binary log.   0       98      #       None            0       No                                              #
  stop slave;
  change master to master_log_pos=73;
  start slave;
(( continues with lines 21 - 33 as in original report ))

It might be significant that this was noticed only in "classic" builds (without InnoDB).
[29 Jan 2007 12:11] Sveta Smirnova
Bug #25713 can be dependent from this one
[27 Nov 2007 16:26] Ingo Strüwing
Raised priority. It presented itself as a crashing bug on Debian x86_64. But it is not always repeatable.

strlen () from /lib/libc.so.6
Protocol::store (this=0xffcf70, from=0x8f8f8f8f8f8f8f8f <Address 0x8f8f8f8f8f8f8f8f out of bounds>, cs=0xf6e2c0) at protocol.cc:723
show_master_info (thd=0xffcbc8, mi=0xff1ca0) at slave.cc:2457
mysql_execute_command (thd=0xffcbc8) at sql_parse.cc:2858
mysql_parse (thd=0xffcbc8, inBuf=0x1006df8 "SHOW SLAVE STATUS", length=18, found_semicolon=0x40881e70) at sql_parse.cc:6113
dispatch_command (command=COM_QUERY, thd=0xffcbc8, packet=0xffed69 "", packet_length=20) at sql_parse.cc:1821
do_command (thd=0xffcbc8) at sql_parse.cc:1595
handle_one_connection (arg=0xffcbc8) at sql_parse.cc:1201
start_thread () from /lib/libpthread.so.0
clone () from /lib/libc.so.6

How to repeat:
OS: Debian GNU/Linux/x86_64
OS: Debian Sid kernel 2.6.22  SMP PREEMPT
gcc (GCC) 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)

bk clone bk-internal.mysql.com:/home/bk/mysql-5.0-engines mysql-5.0-axmrg
cd mysql-5.0-axmrg
BUILD/compile-pentium-debug-max --with-debug=full
make test-force

Disabling the test case. Please re-enable after fix.
[28 Nov 2007 10:24] Bugs System
Pushed into 6.0.4-alpha
[28 Nov 2007 10:26] Bugs System
Pushed into 5.1.23-rc
[28 Nov 2007 10:27] Bugs System
Pushed into 5.0.54
[23 Apr 2008 9:57] Serge Kozlov
Couldn't repeat so probably with others bugs
[24 Apr 2008 21:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45975

ChangeSet@1.2613, 2008-04-25 00:56:29+04:00, skozlov@mysql.com +1 -0
  Bug#8693 is enabled.
[31 Aug 2009 12:23] Andrei Elkin
Test case rpl_log_pos is still disabled in 5.0-bugteam tree.
[21 Dec 2009 11:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95210

2827 Serge Kozlov	2009-12-21
      Bug#8693, Bug#45521.
      modified:
        mysql-test/t/disabled.def
[14 Jan 2010 8:27] Bugs System
Pushed into 5.0.90 (revid:joro@sun.com-20100114082402-05fod2h6z9x9wok8) (version source revid:serge.kozlov@sun.com-20091221114008-v9c2n1rd15cwjr6p) (merge vers: 5.0.89) (pib:16)
[15 Jan 2010 9:01] Bugs System
Pushed into 5.1.43 (revid:joro@sun.com-20100115085139-qkh0i0fpohd9u9p5) (version source revid:satya.bn@sun.com-20091223070903-6rn5kgl3chedqnxd) (merge vers: 5.1.42) (pib:16)
[5 Feb 2010 11:49] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100119163614-172adculixyu26j5) (pib:16)
[5 Feb 2010 11:56] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)
[5 Feb 2010 12:01] Bugs System
Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alexey.kopytov@sun.com-20091225105650-qletdbs0wz9sx5nc) (merge vers: 5.5.1-m2) (pib:16)
[23 Feb 2010 9:12] Luis Soares
As of today, rpl_log_pos is removed from the disabled list for
the following trees:

  - mysql-5.0-bugteam
  - mysql-5.1-bugteam
  - mysql-pe
  - mysql-trunk-bugfixing
  - mysql-next-mr-bugfixing
  - mysql-6.0-codebase-bugfixing

Looking at xref, I find that last failure in main trees dates
back to 2009-12-29:

  - http://tinyurl.com/yeyyh7c

And the latest failures presented in xref seem to have been fixed
in BUG#47743 or were transient platform issues.

As such, Serge, I think we can close this bug by now.
[25 Feb 2010 10:33] Serge Kozlov
Agreed with Luis
[12 Mar 2010 14:12] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:28] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:44] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)