MySQL Bugs: #11249: Starting slave cluster from masters dump fails with error 4350

Bug #11249	Starting slave cluster from masters dump fails with error 4350
Submitted:	10 Jun 2005 17:42	Modified:	17 Jun 2005 19:21
Reporter:	Jonathan Miller	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	5.1.0-wl2325-wl1354-new	OS:	Linux (Linux)
Assigned to:	Tomas Ulin	CPU Architecture:	Any

Description:
Following the instruction in the cluster replication documents for starting a slave cluster from a master cluster's cluster dump, I found that the slave process actaully fails.

The slave "show slave status\G" returns:
                 Last_Errno: 4350
                 Last_Error: Error in Delete_rows event: commit of row events failed

The Slave error log show:

050610 19:16:00 [Note] Slave SQL thread initialized, starting replication in log 'master1.000001' at position 33439325, relay log './ndb10-relay-bin.000001' position: 4
050610 19:16:00 [Note] Slave I/O thread: connected to master 'rep@ndb08:3307',  replication started in log 'master1.000001' at position 33439325
050610 19:16:01 [ERROR] Slave: Error in Delete_rows event: row application failed, Error_code: 4350
050610 19:16:01 [ERROR] Slave: Error in Delete_rows event: commit of row events failed, Error_code: 4350
050610 19:16:01 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master1.000001' position 33502184

How to repeat:
Create two clusters (master and slave)
Start master cluster
Start slave with --skip-slave-start;
Create the bank database on the master;
mysql>create database BANK;
mysql>create table XXX; (create all the bank tables)
Start the bank test and allow to run for a while.
Take a dump from master as follows:
../bin/ndb_mgm ndb08:14000 -e "start backup"

Copy all backup files to slave;

on the slave;
mysql> reset slave;
mysql>create database BANK;

Load the dump from master:
../../../bin/ndb_restore -c ndb10:14000 -n 2 -e -b 1 -m -r ./
../../../bin/ndb_restore -c ndb10:14000 -n 4 -e -b 1  -r ./
../../../bin/ndb_restore -c ndb10:14000 -n 3 -e -b 1  -r ./
../../../bin/ndb_restore -c ndb10:14000 -n 5 -e -b 1  -r ./

Find the epoch from the slave; 
SELECT @latest:=MAX(epoch) FROM cluster_replication.apply_status;
+---------------------+
| @latest:=MAX(epoch) |
+---------------------+
|               37390 |
+---------------------+
1 row in set (0.08 sec)

Find the bin log postion on the master:

mysql> SELECT     @file:=SUBSTRING_INDEX(File, '/', -1), @pos:=Position     FROM cluster_replication.binlog_index     WHERE epoch > 37390 ORDER BY epoch ASC LIMIT 1;
+---------------------------------------+----------------+
| @file:=SUBSTRING_INDEX(File, '/', -1) | @pos:=Position |
+---------------------------------------+----------------+
| master1.000001                        |       33439325 |
+---------------------------------------+----------------+
1 row in set (0.00 sec)

set the start position on the slave:
mysql>  CHANGE MASTER TO     MASTER_LOG_FILE='master1.000001',     MASTER_LOG_POS=33439325;
Query OK, 0 rows affected (0.00 sec)

start the slave;
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

Look at the slave status;

mysql> show slave status\G;
*************************** 1. row ***************************
             Slave_IO_State: Queueing master event to the relay log
                Master_Host: ndb08
                Master_User: rep
                Master_Port: 3307
              Connect_Retry: 1
            Master_Log_File: master1.000001
        Read_Master_Log_Pos: 66922963
             Relay_Log_File: ndb10-relay-bin.000002
              Relay_Log_Pos: 63100
      Relay_Master_Log_File: master1.000001
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 4350
                 Last_Error: Error in Delete_rows event: commit of row events failed
               Skip_Counter: 0
        Exec_Master_Log_Pos: 33502184
            Relay_Log_Space: 33483879
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

I have only seen this once. With the addition of the new perl script I hope not to run into issues like this. If I see it again, I will reopen.