Bug #40459 sporadic failure in rpl_ndb_denote_gap: lost connection during "stop slave"
Submitted: 31 Oct 2008 15:46 Modified: 5 Jun 2009 10:01
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:telco-6.3+, 6.0-rpl OS:Any
Assigned to: Martin Skold CPU Architecture:Any
Tags: 6.0-rpl-green, Lost connection, pushbuild, rpl_ndb_denote_gap, sporadic, test failure

[31 Oct 2008 15:46] Sven Sandberg
Description:
Sporadic failure in rpl_ndb_denote_gap:

rpl_ndb.rpl_ndb_denote_gap               [ fail ]

CURRENT_TEST: rpl_ndb.rpl_ndb_denote_gap
mysqltest: In included file "./include/wait_for_slave_param.inc": At line 97: Error running query ' SHOW SLAVE STATUS': 2013 Lost connection to MySQL server during query

The result from queries just before the failure was:
stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;

* shutdown master *

 - saving '/dev/shm/var-n_mix-100/1/log/rpl_ndb.rpl_ndb_denote_gap/' to '/dev/shm/var-n_mix-100/log/rpl_ndb.rpl_ndb_denote_gap/'
 - found 'core.2441' (0/5)

Trying 'gdb' to get a backtrace
Core generated by '/data0/pushbuild/pb1/pb/bzr_mysql-6.0-rpl/91/mysql-6.0.8-alpha-pb91/sql/mysqld'

How to repeat:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=91 debx86-b/n_mix
xref: http://tinyurl.com/5dp486
[28 Nov 2008 10:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60136

2736 He Zhenxing	2008-11-28
      BUG#40459 sporadic failure in rpl_ndb_denote_gap: lost connection during "stop slave"
      
      When slave failed to set the heartbeat period, it did not set err_code
      to a non-zero value, which caused the assertion to fail.
      
      This patch fixed it by setting err_code to ER_SLAVE_FATAL_ERROR in this case.
[15 Dec 2008 8:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61643

2736 He Zhenxing	2008-12-15
      BUG#40459 sporadic failure in rpl_ndb_denote_gap: lost connection during "stop slave"
      
      When slave failed to set the heartbeat period, it did not set err_code
      to a non-zero value, which caused the assertion to fail.
      
      This patch fixed it by setting err_code to ER_SLAVE_FATAL_ERROR in this case.
[29 Dec 2008 9:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62396

2773 He Zhenxing	2008-12-29
      BUG#40459 sporadic failure in rpl_ndb_denote_gap: lost connection during "stop slave"
      
      When slave failed to set the heartbeat period, it did not set err_code
      to a non-zero value, which caused the assertion to fail.
      
      This patch fixed it by setting err_code to ER_SLAVE_FATAL_ERROR in this case.
[29 Dec 2008 12:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62407

2775 He Zhenxing	2008-12-29 [merge]
      Merge patch for BUG#40459
[5 Jan 2009 13:40] Zhenxing He
Pushed to 6.0-rpl
[7 Jan 2009 13:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62595

2774 He Zhenxing	2009-01-07 [merge]
      Auto merge
[30 Jan 2009 13:30] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)
[1 Feb 2009 11:34] Jon Stephens
Discussed this issue with Luís on IRC today. He agrees that:

1. In 6.0, this isn't limited to Cluster replication but can affect replication generally, since MASTER_HEARTBEAT_PERIOD is available in 6.0-main.

2. Since MASTER_HEARTBEAT_PERIOD is also available in MySQL Cluster NDB 6.3 and 6.4, the fix should be backported to the mysql-5.1-telco-6.3 and mysql-5.1-telco-6.4 trees. However, there's no need to put this into 5.1-main since the affected feature does not exist there.

Documented bugfix in the 6.0.10 changelog as follows:

        When CHANGE MASTER TO ... SET MASTER_HEARTBEAT_PERIOD ... 
        failed, no error code was set.

Changed Category to Replication.

Set status to Verified pending pushing of fix to telco-6.3 and telco-6.4 trees.

Admittedly this is a rather minor fix, but there's no reason why we should wait for the same bug to crop up in one of the 5.1-based Cluster releases, either. ;)
[5 Jun 2009 10:01] Jon Stephens
Also documented in NDB-6.2.15 and 6.3.12 changelogs. (Martin found commit where merged.) Closed.