MySQL Bugs: #47835: MBR replication failure under concurrent load with OFFSET in subselect

Bug #47835	MBR replication failure under concurrent load with OFFSET in subselect
Submitted:	5 Oct 2009 13:19	Modified:	17 Dec 2010 0:04
Reporter:	Elena Stepanova	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.1.37sp1	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Configuration/flow:

A multi-thread stress test performs concurrent INSERTs, UPDATEs, DELETEs, SELECTs on a MyISAM table tb1 with auto-increment PK. Among other operations, one test flow includes the following:

- creates a temporary table t1_tmp with default engine;
- adds an auto-increment PK to the table;
- in a loop until the temporary table contains 40 rows:
  - counts rows in tb1;
  - picks up a random offset value <offset> from [0..<count>] range*;
  - executes INSERT INTO t1_tmp (<non-PK fields>) SELECT <non-PK fields> FROM tb1 WHERE <pk> = ( SELECT <pk> FROM tb1 LIMIT 1 OFFSET <offset> );

*This is a test error, should be <count-1>, but it's irrelevant to the problem.

More notes:
- the test was run in replication context, with MBR;
- default engine is MyISAM;
- tb1 has triggers after insert,delete,update which insert records into another MyISAM table;
- tb1 is partitioned by hash(<pk>), 2 partitions.

The problem:

At some point slave SQL thread aborted with duplicate key error saying that there was an attempt to insert an already existing PK value = 24 into t1_tmp.

Investigation:

Due to dynamic log cleanup and slave being almost 30 min behind the master, master log, master binlogs, and slave-relay logs prior to the one where SQL thread stopped were not available for investigation. The current slave-relay binlog showed that the master server bin-logged 23 inserts with different offset values from 1177 to 5595, each time with increasing INSERT_ID value. 
At the 23rd step offset value was 2128. 
At the 24th step it logged insert with offset value 836, but *without* INSERT_ID value in the binlog, which suggests that no rows were inserted. 
At the 25th step it logged insert with offset value 1786 and with INSERT_ID = 24, and that was the one which caused the replication failure.

All loop steps were performed in less than 1 second.

Since the 25th insert was successful on the master, it must mean that the previous insert really did not happen (as opposed to that it was bin-logged wrongly with missing INSERT_ID). The only reason I can think of is that between 23rd and 24th step some other connection(s) removed at least ~1300 records from tb1, and removed even more between count(*) and insert at the 24th step, and inserted almost 1000 at the very next moment (between steps 24 and 25). All this is theoretically possible in the stress test work flow, although the probability is very low. 

However, the massive delete(s) between steps 23 and 24 apparently did not make it into the slave-relay log: neither could they be found in the log available for investigation, nor the slave server performed the delete(s) since, unlike the master, it successfully inserted a row on the step 24 which caused the failure on step 25.

So, the reason of the problem might be one of the following:
- DELETEs from tb1 was not bin-logged, which caused difference in the table contents;
- sub-select with OFFSET worked wrongly at step 24 on the master and did not return the record while it should have;
- INSERT at step 24 was bin-logged incorrectly (e.g. there was an error on the master during the statement, but it was still bin-logged with error-code=0);
- DELETEs were lost somewhere between master binlog and slave relay log.

How to repeat:
This happened only once, could not reproduce it so far

It has never been observed again, although the tests are run on daily basis on several trees and platforms. I assume it was either the same problem as in bug#49186 (which, although set to 'Can't repeat' status, was in fact fixed), or a similar one. However, since this bug has never had a reproducible test case, closing as 'Can't repeat'.