Bug #38241 replication failure with a very simple mix of INSERT / UPDATE / DELETE
Submitted: 19 Jul 2008 17:24 Modified: 25 Sep 2008 16:24
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S1 (Critical)
Version:5.1 OS:Any
Assigned to: Mats Kindahl CPU Architecture:Any

[19 Jul 2008 17:24] Philip Stoev
Description:
When executing a very simple mix of INSERT / UPDATE / DELETE queries bundled in short transactions against Innodb in two or more threads, row-level eplication quickly fails with message;

080719 20:16:32 [ERROR] Slave SQL: Could not execute Update_rows event on table test.C; Can't find record in 'C', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 286452, Error_code: 1032
080719 20:16:32 [Warning] Slave: Can't find record in 'C' Error_code: 1032
080719 20:16:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 286209

The test case does not involve anything else besides extremely simple INSERT / SELECT / UPDATE and BEGIN/COMMIT. The queries are of the form:

UPDATE C SET `int_key` = $random_digit;
DELETE FROM C WHERE `int_key` = $random_digit;
INSERT INTO C ( `int_key` ) VALUES ( $random_digit );

How to repeat:
A test case will be uploaded shortly.
[19 Jul 2008 17:41] Philip Stoev
Grammar file for bug 38241.yy

Attachment: bug38241.yy (application/octet-stream, text), 315 bytes.

[19 Jul 2008 17:44] Philip Stoev
To reproduce this bug, please clone the mysql-test-extra-6.0 tree and execute:

$ cd mysql-test-extra-6.0/mysqltest/gentest
$ perl runall.pl \
  --basedir=/path/to/mysql-5.1 \
  --rpl_mode=row \
  --grammar=/location/of/bug38241.yy \
  --threads=2 \
  --engine=innodb \
  --mysqld=--innodb_lock_wait_timeout=1

This will set up replication and proceed to execute queries generated from the grammar file in two threads. Shortly after takeoff the framework will detect that replication has failed and the test will terminate.
[21 Jul 2008 10:40] Sveta Smirnova
Thank you for the report.

Verified as described.

I had to run 2 times test provided to repeat the failure.
[21 Jul 2008 10:43] Philip Stoev
Yes, you are right, maybe 2 threads was overly-optimistic -- using more threads will increase the likelihood of observing the problem.
[21 Jul 2008 11:01] Philip Stoev
Also present in 5.1.26
[21 Jul 2008 11:08] Philip Stoev
Also present in 5.1.23a
[21 Jul 2008 11:48] MySQL Verification Team
Sounds the same as bug #37317 ?  I saw these errors after lock wait timeout or deadlock.
[21 Jul 2008 15:08] Philip Stoev
Shane, maybe you are right. However I recall getting that error even with queries that did not generate deadlocks or timeouts.
[25 Sep 2008 9:45] Mats Kindahl
BUG#37317 is a duplicate of BUG#32709 and it is triggered by certain statements failing for transactional tables. A deadlock is not necessary, just a failing statement on a transactional table.
[25 Sep 2008 16:24] Sveta Smirnova
Bug is not repeatable with current sources.