Bug #31958 many parallell create/delete ndb may hang ndbapi
Submitted: 30 Oct 2007 15:54 Modified: 15 Dec 2007 10:45
Reporter: Geert Vanderkelen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:5.1 OS:Any
Assigned to: Tomas Ulin CPU Architecture:Any

[30 Oct 2007 15:54] Geert Vanderkelen
Description:
A slave got stuck on a relay log leading to piling up logs and eventually running out of disk space.

Parts of the SHOW SLAVE STATUS (some files/hosts/users have been renamed because of privacy issues):

 Slave_IO_State: Waiting for master to send event 
          Master_Host: 192.168.0.2
          Master_User: repl 
          Master_Port: 3308 
        Connect_Retry: 3 
      Master_Log_File: Slave-3_210_binlog.000571 
  Read_Master_Log_Pos: 21718884 
       Relay_Log_File: Slave-3_110_relaylog.000127 
        Relay_Log_Pos: 88573599 
Relay_Master_Log_File: Subs-3_210_binlog.000284 
     Slave_IO_Running: Yes 
    Slave_SQL_Running: Yes
..
  Exec_Master_Log_Pos: 88573450 
      Relay_Log_Space: 76802774879
..
Seconds_Behind_Master: 108451 

However, relay log Slave-3_110_relaylog.000125 was still available and the header looks like this:

/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#71027 13:30:52 server id 110  end_log_pos 102  Start: binlog v 4, server v 5.1.15-ndb-6.1.22-log created 71027 13:30:52
# at 102
#71027 13:24:37 server id 210  end_log_pos 0    Start: binlog v 4, server v 5.1.15-ndb-6.1.22-log created 71027 13:24:37

The odd thing here is that the Master (= id 210) binary log timestamp shows a timestamp of a binary log which doesn't exists anymore on the Master.
The relay log 125 was rotated exactly when the binary log got rotated.

The error seems to be consistent with each binary rotation on the master side:

13:30:52 SLAVE	mysqld rotates relaylog.000125 -> relaylog.000126 (*)
13:30:52 SLAVE	mysqld rotates relaylog.000126 -> relaylog.000127 
13:37:07 MASTER	mysqld rotates binlog.000284 -> binlog.000285
13:37:07 SLAVE	mysqld rotates relaylog.000127 -> relaylog.000128
13:37:07 SLAVE	mysqld rotates relaylog.000128 -> relaylog.000129 (*)
13:37:07 SLAVE	mysqld rotates relaylog.000129 -> relaylog.000130
13:43:21 MASTER	mysqld rotates binlog.000285 -> binlog.000286
13:43:21 SLAVE	mysqld rotates relaylog.000130 -> relaylog.000131
13:43:21 SLAVE	mysqld rotates relaylog.000131 -> relaylog.000132 (*)
13:43:21 SLAVE	mysqld rotates relaylog.000132 -> relaylog.000133
13:49:37 MASTER	mysqld rotates binlog.000286 -> binlog.000287
13:49:37 SLAVE	mysqld rotates relaylog.000133 -> relaylog.000134
13:49:37 SLAVE	mysqld rotates relaylog.000134 -> relaylog.000135 (*)
13:49:37 SLAVE	mysqld rotates relaylog.000135 -> relaylog.000136

so 125, 128, 131 and 134 (and lots more) are showing similar effects.

Binary logs were written at 256M/±6min.

How to repeat:
No repeatable test case.
[20 Nov 2007 13:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/38132

ChangeSet@1.2498, 2007-11-20 15:02:37+01:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug #31958  	Slave stuck and stops processing relay logs
[21 Nov 2007 14:11] Jon Stephens
Documented bugfix in 5.1.15-ndb-6.1.23 changelog. Left in Patch Queued status pending further pushes/merges.
[23 Nov 2007 10:52] Jon Stephens
Also noted in 5.1.22-ndb-6.2.9 changelog. Left status as PQ.
[14 Dec 2007 17:07] Bugs System
Pushed into 5.1.23-rc
[14 Dec 2007 17:08] Bugs System
Pushed into 6.0.5-alpha
[15 Dec 2007 10:45] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Bugfix also now documented in 5.1.23 and 6.0.5 changelogs; closed.