Bug #34447 replication failure on DELETE after INSERT on an unrelated table
Submitted: 10 Feb 2008 14:46 Modified: 19 Mar 2008 7:12
Reporter: Philip Stoev Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.1-BK OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[10 Feb 2008 14:46] Philip Stoev
Description:
Mixed-mode replication fails with "Can't find record" if there is an INSERT into an unrelated table.

Both MyISAM and Maria are affected. Both the mysql-5.1 and the mysql-maria trees are affected.

The bug shows when this procedure is being run in a tight loop with two concurrent threads:

BEGIN
   DECLARE ins_count INT DEFAULT 100;
   DECLARE del_count INT;
   DECLARE cur_user VARCHAR(255);
   DECLARE local_uuid VARCHAR(255);
   DECLARE local_time TIMESTAMP;

   SET local_time= NOW();
   SET cur_user = CURRENT_USER();
   SET local_uuid=UUID();

   WHILE ins_count > 0 DO
     INSERT INTO test.insdel2_tbl VALUES (NULL, NOW(), USER(), UUID(), ins_count,'Going to test MBR for MySQL');
     INSERT INTO test.insdel2_sub VALUES (NULL,ins_count, 'More text to test with');
     SET ins_count = ins_count - 1;
   END WHILE;

   SELECT MAX(id) FROM test.insdel2_tbl INTO del_count;
   WHILE del_count > 0 DO
     DELETE FROM test.insdel2_tbl WHERE id = del_count;
   SET del_count = del_count - 2;
   END WHILE;
END

This procedure basically inserts 100 rows into two tables, and then deletes some of them from the first table. Both the second INSERT and the DELETE must be present for the bug to show up.

SHOW SLAVE STATUS shows:

               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 9306
                Connect_Retry: 1
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 161461
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 98054
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Can't find record in 'insdel2_tbl'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 97908
              Relay_Log_Space: 161762
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Can't find record in 'insdel2_tbl'

slave error log says:

080210 17:42:13 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.insdel2_tbl; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 98091, Error_code: 0
080210 17:42:13 [ERROR] Slave SQL: Can't find record in 'insdel2_tbl', Error_code: 1032
080210 17:42:13 [Warning] Slave: Can't find record in 'insdel2_tbl' Error_code: 1032
080210 17:42:13 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 97908

How to repeat:
Test case will follow shortly.
[10 Feb 2008 14:50] Philip Stoev
Test case for bug 34447

Attachment: bug34447.zip (application/x-zip-compressed, text), 1.31 KiB.

[10 Feb 2008 14:52] Philip Stoev
Please run the attached test case as follows:

1. Unpack the zip file so that the .txt files are placed in mysql-test and the .test files are placed in mysql-test/t.

2. Start a mysql server in replication:

$ perl ./mysql-test-run.pl --start-and-exit --skip-ndb rpl_alter

3. Run test case against that server:

$ perl ./mysql-test-run.pl --stress --stress-init-file=bug34447_init.txt --stress-test-file=bug34447_run.txt --stress-threads=2 --stress-test-duration=600 --extern --socket=var/tmp/master.sock --user=root

4. In a separate mysql client, issue SHOW PROCESSLIST. You will see that replication will fail very shortly after starting the test.

If the test is run with --stress-threads=2, no failures are observed. If either the DELETE or the second INSERT query is removed, no failures either.
[12 Feb 2008 10:13] Philip Stoev
Final comment should read "if test is run with --stress-test-threads=1, no errors are observed", that is, it is this bug is a concurrency issue since it does not show itself up when only one client is running.
[12 Feb 2008 18:54] Susanne Ebrecht
Verified as described.
[13 Feb 2008 22:16] Omer Barnir
triage: need e/r values to determine target
[19 Mar 2008 7:12] Zhenxing He
dup of bug#29020