Bug #49186 Concurrent insert/select causes slave abort if select is under read committed
Submitted: 29 Nov 2009 13:52 Modified: 14 Dec 2009 20:46
Reporter: Elena Stepanova Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.1.41, 5.5.0 OS:Any
Assigned to: Alfranio Tavares Correia Junior CPU Architecture:Any

[29 Nov 2009 13:52] Elena Stepanova
Description:
If one connection performs insert into a MyISAM table while reading from an InnoDB table under READ-COMMITTED isolation level, and another connection inserts into the InnoDB table, a race condition causes discrepancy between master and slave contents, and eventually slave SQL abort.

MBR is affected (SBR is too, but that could be expected).

In the test case below, connection master1 performs a slow insert into a MyISAM table t2 while doing select from an InnoDB table t1. By the time the statement is issued, the desired row in the InnoDB table does not exist yet; but while the query is performed, connection master1 inserts the row into the table. The transaction from 'master1' is written into binlog before the transaction from 'master'. When the transaction from 'master' is finished, it is written into binlog without INSERT_ID (as no records were inserted). On slave, however, the row is inserted into t2 because there the insert into t1 happens earlier.

Then, connection 'master' performs a plain insert into the table t2. Now the binlog event is written with explicit INSERT_ID=1, which is correct for the master, but not for the slave, and it makes slave SQL thread abort with the duplicate key error.

How to repeat:
# t/rpl_slow_insert.test
# run as 
# perl ./mysql-test-run.pl --mysqld=--innodb \
# --mysqld=--binlog_format=mixed rpl_slow_insert

--source include/master-slave.inc
--source include/have_innodb.inc

connection default;

create table t1 
( i int not null auto_increment primary key, f int ) 
engine = InnoDB;
create table t2 
( i int not null auto_increment primary key, f int ) 
engine = MyISAM;

insert into t1 (f) values (1);

connection master;

set session transaction isolation level read committed;    
send insert into t2 (f) 
select f from t1 
where i = 2+sleep(1);

connection master1;

set session transaction isolation level repeatable read;
insert into t1 (f) values (2);

connection master;
reap;
insert into t2 (f) values (2);

sync_slave_with_master;

Suggested fix:
If t2 is an InnoDB table, binary logging switches into ROW format (as described in http://dev.mysql.com/doc/refman/5.4/en/binary-log-mixed.html) and the problem does not happen. However in the case above an InnoDB table is only involved into reading, but is not updated, and apparently it is not enough for the switch.

Maybe MBR should switch to ROW format in cases like these as well (or, they could be described as yet another limitation, and/or a warning could be produced).
[29 Nov 2009 14:49] Valeriy Kravchuk
Verified just as described with recent mysql-trunk from bzr on Mac OS X:

77-52-12-228:mysql-test openxs$ ./mysql-test-run.pl --mysqld=--innodb --mysqld=--binlog_format=mixed --suite=rpl bug_49186
Logging: ./mysql-test-run.pl  --mysqld=--innodb --mysqld=--binlog_format=mixed --suite=rpl bug_49186
091129 16:47:01 [Warning] Setting lower_case_table_names=2 because file system for /var/folders/dX/dXCzvuSlHX4Op1g-o1jIWk+++TI/-Tmp-/oIhE61J6nv/ is case insensitive
091129 16:47:01 [Note] Plugin 'FEDERATED' is disabled.
091129 16:47:01 [Note] Plugin 'ndbcluster' is disabled.
MySQL Version 5.5.0
Using binlog format 'mixed'
Checking supported features...
 - using ndbcluster when necessary, mysqld supports it
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
 - adding combinations for rpl
vardir: /Users/openxs/dbs/trunk/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/Users/openxs/dbs/trunk/mysql-test/var'...
Installing system database...
Using server port 54665

==============================================================================

TEST                                      RESULT   TIME (ms)
------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;
create table t1 
( i int not null auto_increment primary key, f int ) 
engine = InnoDB;
create table t2 
( i int not null auto_increment primary key, f int ) 
engine = MyISAM;
insert into t1 (f) values (1);
set session transaction isolation level read committed;
insert into t2 (f) 
select f from t1 
where i = 2+sleep(1);
set session transaction isolation level repeatable read;
insert into t1 (f) values (2);
insert into t2 (f) values (2);
rpl.bug_49186 'row'                      [ pass ]   3062

MTR's internal check of the test case 'rpl.bug_49186' failed.
This means that the test case does not preserve the state that existed
before the test case was executed.  Most likely the test case did not
do a proper clean-up.
This is the diff of the states of the servers before and after the
test case was executed:
mysqltest: Logging to '/Users/openxs/dbs/trunk/mysql-test/var/tmp/check-mysqld_2.log'.
mysqltest: Results saved in '/Users/openxs/dbs/trunk/mysql-test/var/tmp/check-mysqld_2.result'.
mysqltest: Connecting to server localhost:13001 (socket /Users/openxs/dbs/trunk/mysql-test/var/tmp/mysqld.2.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: ... Connected.
mysqltest: Start processing test commands from './include/check-testcase.test' ...
mysqltest: ... Done processing test commands.
--- /Users/openxs/dbs/trunk/mysql-test/var/tmp/check-mysqld_2.result	2009-11-29 17:47:06.000000000 +0300
+++ /Users/openxs/dbs/trunk/mysql-test/var/tmp/check-mysqld_2.reject	2009-11-29 17:47:10.000000000 +0300
@@ -309,6 +309,8 @@
 def	mysql	latin1	latin1_swedish_ci	NULL
 def	test	latin1	latin1_swedish_ci	NULL
 tables_in_test
+t1
+t2
 tables_in_mysql
 mysql.columns_priv
 mysql.db

mysqltest: Result content mismatch

not ok

stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;
create table t1 
( i int not null auto_increment primary key, f int ) 
engine = InnoDB;
create table t2 
( i int not null auto_increment primary key, f int ) 
engine = MyISAM;
insert into t1 (f) values (1);
set session transaction isolation level read committed;
insert into t2 (f) 
select f from t1 
where i = 2+sleep(1);
set session transaction isolation level repeatable read;
insert into t1 (f) values (2);
insert into t2 (f) values (2);
rpl.bug_49186 'stmt'                     [ fail ]
        Test ended at 2009-11-29 16:47:21

CURRENT_TEST: rpl.bug_49186
=== SHOW MASTER STATUS ===
---- 1. ----
File	slave-bin.000001
Position	966
Binlog_Do_DB	
Binlog_Ignore_DB	
==========================

=== SHOW SLAVE STATUS ===
---- 1. ----
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	13000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	1076
Relay_Log_File	slave-relay-bin.000003
Relay_Log_Pos	1103
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	1062
Last_Error	Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t2 (f) values (2)'
Skip_Counter	0
Exec_Master_Log_Pos	956
Relay_Log_Space	1379
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	1062
Last_SQL_Error	Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t2 (f) values (2)'
Replicate_Ignore_Server_Ids	
Master_Server_Id	1
...
[29 Nov 2009 14:56] Valeriy Kravchuk
Same with 5.1.42 from bzr:

...
rpl.bug_49186 'mix'                      [ fail ]
        Test ended at 2009-11-29 16:53:34

CURRENT_TEST: rpl.bug_49186
=== SHOW MASTER STATUS ===
---- 1. ----
File	slave-bin.000001
Position	965
Binlog_Do_DB	
Binlog_Ignore_DB	
==========================

=== SHOW SLAVE STATUS ===
---- 1. ----
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	13000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	1075
Relay_Log_File	slave-relay-bin.000003
Relay_Log_Pos	1101
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	1062
Last_Error	Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t2 (f) values (2)'
Skip_Counter	0
Exec_Master_Log_Pos	955
Relay_Log_Space	1376
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	1062
Last_SQL_Error	Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t2 (f) values (2)'
=========================

=== SHOW PROCESSLIST ===
---- 1. ----
Id	2
User	root
Host	localhost:54754
db	test
Command	Query
Time	0
State	
Info	SHOW PROCESSLIST
---- 2. ----
Id	3
User	root
Host	localhost:54755
db	test
Command	Sleep
Time	5
State	
Info	
---- 3. ----
Id	4
User	system user
Host	
db	
Command	Connect
Time	4
State	Waiting for master to send event
Info	
========================

analyze: sync_with_master
mysqltest: At line 31: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 1075, 300)' returned NULL indicating slave SQL thread failure
...
[2 Dec 2009 17:45] Andrei Elkin
This particular problem of reverse to execution binlogging was solved by 
alfranio.correia@sun.com-20091103190256-637o8qxlveikrt3i et all 
in mysql-5.1-rep+3.

The symptom of description
" The transaction from 'master1' is written into binlog before the
transaction from 'master' "
can still be true *but* INSERT-N..SELECT-T query (trans master)
is binlogged in the ROW-format and that makes difference.
Namely, in the regression test case two binlog orders

master1-trans
rows-events-of-master

rows-events-of-master
master1-trans

are equivalent, i.e the slave data would be the same.
Could you please re-verify this case against rep+3 tree.

And very nice test, thanks!
[2 Dec 2009 19:17] Andrei Elkin
At verifying on mysql-5.1-rep+3 make sure to check two versions of

rows-events-of-master
master1-trans

sequence. One with rows-events-of-master is empty (that the test would
generate [instead of effectless query in 5.1]) 
and 2nd with rows-events-of-master is something. You would
have slightly modify the test for it.
[2 Dec 2009 19:31] Elena Stepanova
Already did, both scenarios seem to be working all right on mysql-5.1-rep+3. 
I'll also run a short stress test with the initial flow just in case it reveals any variations of the scenario which are still affected. The probability of this race condition during the standard stress test is not very high though.