Bug #31484 Cluster LOST_EVENTS entry not added to binlog on mysqld restart.
Submitted: 9 Oct 2007 15:41 Modified: 15 Dec 2007 10:42
Reporter: Matthew Montgomery Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:5.1.22 OS:Any
Assigned to: Tomas Ulin CPU Architecture:Any
Triage: D2 (Serious)

[9 Oct 2007 15:41] Matthew Montgomery
Description:
From Bug #21494:

I see no LOST_EVENTS entries in my binlog on restart

If I cause the running master mysqld node to disconnect and reconnect to the cluster, by severing the network link, it will add a LOST_EVENTS entry to the binlog as expected. However, when the master mysqld node crashes or has a normal restart it will not create the LOST_EVENTS entry in the binlog.  This entry should be added to the binlog at each startup.  Without it the slave will not know that the master may have missed entries while offline.  If no failover to a secondary replication channel has occured, the slave will reconnect to the master and resume replication and miss log entries.

How to repeat:
slave> SLAVE STOP;
master> RESET MASTER;
master> USE test
master> CREATE TABLE `t1` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `b` varchar(20) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
slave> CHANGE MASTER TO master_host='master', master_user='repl', master_password='node1',
master_log_file='master-bin.000001', master_log_pos=4,  master_port=33511;
slave> SLAVE START
slave> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: repl
                  Master_Port: 33511
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 106
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 251
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
... 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 106
              Relay_Log_Space: 408
...
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

shell> hugoLoad --records=1000000 --batch=100 --database=test t1

While hugoLoad is in progress shutdown and restart 'master'

 master$> ./bin/mysqladmin shutdown 
 master$> ./bin/mysqld_safe  --defaults-file=./my.cnf  &

master> USE test
master> SELECT count(*) FROM t1;
+----------+
| count(*) |
+----------+
|  1000000 | 
+----------+
1 row in set (0.01 sec)

master> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: master-bin.000002
        Position: 13232653
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

slave> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: master
                  Master_User: repl
                  Master_Port: 33511
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 5929762
               Relay_Log_File: slave-relay-bin.000004
                Relay_Log_Pos: 5929907
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
...
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 5929762
              Relay_Log_Space: 5930064
...
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2013
                Last_IO_Error: error reconnecting to master 'repl@master:33511' -
retry-time: 60  retries: 86400
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

<no action>

slave> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: repl
                  Master_Port: 33511
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 13232653
               Relay_Log_File: slave-relay-bin.000006
                Relay_Log_Pos: 13232798
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 13232653
              Relay_Log_Space: 14493936
...
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2013
                Last_IO_Error: error reconnecting to master 'repl@master:33511' -
retry-time: 60  retries: 86400
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

slave> USE test
slave> SELECT COUNT(*) FROM t1; 
+----------+
| COUNT(*) |
+----------+
|   747744 | 
+----------+
1 row in set (0.03 sec)

master.Position = slave.Exec_master_log_pos 
but
master.t1.count(*) != slave.t1.count(*)

Suggested fix:
.
[9 Oct 2007 15:42] Matthew Montgomery
No LOST_EVENTS entry in at end of binlog 000001 at restart.

mysql> show binlog events in 'master-bin.000001' from 7179716;
+------------------+---------+------------+-----------+-------------+---------------------
-----------+
| Log_name         | Pos     | Event_type | Server_id | End_log_pos | Info                
          |
+------------------+---------+------------+-----------+-------------+---------------------
-----------+
| master-bin.000001 | 7179716 | Write_rows |         1 |     7180742 | table_id: 21       
           |
| master-bin.000001 | 7180742 | Write_rows |         1 |     7181760 | table_id: 21       
           |
| master-bin.000001 | 7181760 | Write_rows |         1 |     7182779 | table_id: 21       
           |
| master-bin.000001 | 7182779 | Write_rows |         1 |     7183808 | table_id: 21       
           |
| master-bin.000001 | 7183808 | Write_rows |         1 |     7184848 | table_id: 21       
           |
| master-bin.000001 | 7184848 | Write_rows |         1 |     7185888 | table_id: 21       
           |
| master-bin.000001 | 7185888 | Write_rows |         1 |     7186924 | table_id: 21       
           |
| master-bin.000001 | 7186924 | Write_rows |         1 |     7187939 | table_id: 21       
           |
| master-bin.000001 | 7187939 | Write_rows |         1 |     7188974 | table_id: 21       
           |
| master-bin.000001 | 7188974 | Write_rows |         1 |     7190013 | table_id: 21       
           |
| master-bin.000001 | 7190013 | Write_rows |         1 |     7190533 | table_id: 21 flags:
STMT_END_F |
| master-bin.000001 | 7190533 | Query      |         1 |     7190598 | COMMIT             
           |
| master-bin.000001 | 7190598 | Stop       |         1 |     7190617 |                    
           |
+------------------+---------+------------+-----------+-------------+---------------------
-----------+
13 rows in set (0.00 sec)

No LOST_EVENTS entry at the beginning of 000002 at restart.

mysql> show binlog events in 'master-bin.000002' limit 10;
+------------------+------+-------------+-----------+-------------+-----------------------
--------------------------------+
| Log_name         | Pos  | Event_type  | Server_id | End_log_pos | Info                  
                               |
+------------------+------+-------------+-----------+-------------+-----------------------
--------------------------------+
| master-bin.000002 |    4 | Format_desc |         1 |         106 | Server ver:
5.1.22-ndb-6.3.2-telco-log, Binlog ver: 4 |
| master-bin.000002 |  106 | Query       |         1 |         170 | BEGIN                
                                |
| master-bin.000002 |  170 | Table_map   |         1 |         215 | table_id: 18
(test.t1)                                |
| master-bin.000002 |  215 | Table_map   |         1 |         277 | table_id: 16
(mysql.ndb_apply_status)                 |
| master-bin.000002 |  277 | Write_rows  |         1 |         336 | table_id: 16         
                                |
| master-bin.000002 |  336 | Write_rows  |         1 |        1376 | table_id: 18         
                                |
| master-bin.000002 | 1376 | Write_rows  |         1 |        2407 | table_id: 18         
                                |
| master-bin.000002 | 2407 | Write_rows  |         1 |        3422 | table_id: 18         
                                |
| master-bin.000002 | 3422 | Write_rows  |         1 |        4447 | table_id: 18         
                                |
| master-bin.000002 | 4447 | Write_rows  |         1 |        5483 | table_id: 18         
                                |
+------------------+------+-------------+-----------+-------------+-----------------------
--------------------------------+
10 rows in set (0.00 sec)
[1 Nov 2007 14:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36874

ChangeSet@1.2572, 2007-11-01 15:08:00+01:00, tomas@whalegate.ndb.mysql.com +5 -0
  Bug #31484  	Cluster LOST_EVENTS entry not added to binlog on mysqld restart.
[1 Nov 2007 18:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36915

ChangeSet@1.2573, 2007-11-01 20:00:44+01:00, tomas@whalegate.ndb.mysql.com +3 -0
  Bug #31484    Cluster LOST_EVENTS entry not added to binlog on mysqld restart
  - correction
[6 Nov 2007 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/37184

ChangeSet@1.2581, 2007-11-06 15:12:27+01:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug #31484    Cluster LOST_EVENTS entry not added to binlog on mysqld restart
  - correction, do not insert GAP on first startup
[8 Nov 2007 13:32] Jon Stephens
Documented bugfix in 5.1.22-ndb-6.2.8 and 5.1.22-ndb-6.3.6 changelogs; left in PQ status.
[14 Dec 2007 17:08] Bugs System
Pushed into 5.1.23-rc
[14 Dec 2007 17:09] Bugs System
Pushed into 6.0.5-alpha
[15 Dec 2007 10:42] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Bugfix now also documented in 5.1.23 and 6.0.5 changelogs; closed.