MySQL Bugs: #34447: replication failure on DELETE after INSERT on an unrelated table

Bug #34447	replication failure on DELETE after INSERT on an unrelated table
Submitted:	10 Feb 2008 14:46	Modified:	19 Mar 2008 7:12
Reporter:	Philip Stoev	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.1-BK	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Mixed-mode replication fails with "Can't find record" if there is an INSERT into an unrelated table.

Both MyISAM and Maria are affected. Both the mysql-5.1 and the mysql-maria trees are affected.

The bug shows when this procedure is being run in a tight loop with two concurrent threads:

BEGIN
   DECLARE ins_count INT DEFAULT 100;
   DECLARE del_count INT;
   DECLARE cur_user VARCHAR(255);
   DECLARE local_uuid VARCHAR(255);
   DECLARE local_time TIMESTAMP;

   SET local_time= NOW();
   SET cur_user = CURRENT_USER();
   SET local_uuid=UUID();

   WHILE ins_count > 0 DO
     INSERT INTO test.insdel2_tbl VALUES (NULL, NOW(), USER(), UUID(), ins_count,'Going to test MBR for MySQL');
     INSERT INTO test.insdel2_sub VALUES (NULL,ins_count, 'More text to test with');
     SET ins_count = ins_count - 1;
   END WHILE;

   SELECT MAX(id) FROM test.insdel2_tbl INTO del_count;
   WHILE del_count > 0 DO
     DELETE FROM test.insdel2_tbl WHERE id = del_count;
   SET del_count = del_count - 2;
   END WHILE;
END

This procedure basically inserts 100 rows into two tables, and then deletes some of them from the first table. Both the second INSERT and the DELETE must be present for the bug to show up.

SHOW SLAVE STATUS shows:

               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 9306
                Connect_Retry: 1
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 161461
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 98054
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Can't find record in 'insdel2_tbl'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 97908
              Relay_Log_Space: 161762
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Can't find record in 'insdel2_tbl'

slave error log says:

080210 17:42:13 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.insdel2_tbl; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 98091, Error_code: 0
080210 17:42:13 [ERROR] Slave SQL: Can't find record in 'insdel2_tbl', Error_code: 1032
080210 17:42:13 [Warning] Slave: Can't find record in 'insdel2_tbl' Error_code: 1032
080210 17:42:13 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 97908

How to repeat:
Test case will follow shortly.

Test case for bug 34447

Attachment: bug34447.zip (application/x-zip-compressed, text), 1.31 KiB.

Please run the attached test case as follows:

1. Unpack the zip file so that the .txt files are placed in mysql-test and the .test files are placed in mysql-test/t.

2. Start a mysql server in replication:

$ perl ./mysql-test-run.pl --start-and-exit --skip-ndb rpl_alter

3. Run test case against that server:

$ perl ./mysql-test-run.pl --stress --stress-init-file=bug34447_init.txt --stress-test-file=bug34447_run.txt --stress-threads=2 --stress-test-duration=600 --extern --socket=var/tmp/master.sock --user=root

4. In a separate mysql client, issue SHOW PROCESSLIST. You will see that replication will fail very shortly after starting the test.

If the test is run with --stress-threads=2, no failures are observed. If either the DELETE or the second INSERT query is removed, no failures either.

Final comment should read "if test is run with --stress-test-threads=1, no errors are observed", that is, it is this bug is a concurrency issue since it does not show itself up when only one client is running.

Verified as described.

triage: need e/r values to determine target

dup of bug#29020