MySQL Bugs: #50413: insert on duplicate key update sometimes writes binlog position incorrectly

Bug #50413	insert on duplicate key update sometimes writes binlog position incorrectly
Submitted:	18 Jan 2010 13:54	Modified:	12 Nov 2013 12:50
Reporter:	justin dahlke	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.0.91,5.1.50,5.6.0	OS:	Any
Assigned to:	Sunny Bains	CPU Architecture:	Any

Description:
I believe under write heavy load the master db is writing insert on duplicate key update statements to the binlog incorrectly.  

It occurs many times a day, but it's def a race condition. After running a maatkit checksum on the subscriberfield table on master/slave we found 22% of batches of 100k rows aren't consistent (so prob overall <1%). There's currently ~100 mil rows in that table. Here's an example:

On the master

	select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846;
	+------------+---------------+----------+------------+-------+
	| id         | subscriber_id | field_id | content    | mtime |
	+------------+---------------+----------+------------+-------+
	| 1839528432 |      90615540 |    42846 | francois | NULL  | 
	+------------+---------------+----------+------------+-------+

Here's how it replicates to the slave

	select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846;
	+------------+---------------+----------+---------+-------+
	| id         | subscriber_id | field_id | content | mtime |
	+------------+---------------+----------+---------+-------+
	| 1839528432 |      90615540 |    42846 | 78633   | NULL  | 
	+------------+---------------+----------+---------+-------+

After inspecting the bin logs, I think mysql is writing the incorrect SET INSERT_ID into the binlog occasionally for insert on duplicate key update statements.

We have a multisharded environment and I've been able to see this same problem on all of our shards. Our master/slave pairs are running 5.0.77 which is what I did this troubleshooting on. We also have some shards with a 2nd slave running 5.1.34 which exhibit the exact same symptoms.   

Bug might be related to bug #44976

How to repeat:
To troubleshoot we brought up a new slave taken from a backup and traced the state of a row we knew would replicate incorrectly. 

To locate the insert statement we trolled through the bin logs on the slave and found:

	# at 97813391
	#100114 11:09:31 server id 13  end_log_pos 97813419 	Intvar
	SET INSERT_ID=1839528432/*!*/;
	# at 97813419
	#100114 11:09:31 server id 13  end_log_pos 97813767 	Query	thread_id=44827176	exec_time=1	error_code=0
	use mail_app/*!*/;
	SET TIMESTAMP=1263485371/*!*/;
	INSERT INTO subscriberfield 
	                                SET subscriber_id = '90615540', 
	                                field_id = '42846', 
	                                content = 'francois' 
	                                ON DUPLICATE KEY UPDATE content = 'francois'
	/*!*/;
	# at 97813767
	#100114 11:09:31 server id 13  end_log_pos 97813794 	Xid = 28190255
	COMMIT/*!*/;

Next, we caught replication on right up to the point where the insert had executed

	CHANGE MASTER TO MASTER_LOG_FILE='db-bin.000239', MASTER_LOG_POS= 324249907, MASTER_USER='******', MASTER_PASSWORD='******', MASTER_HOST='10.0.0.26', MASTER_PORT=3306;

	START SLAVE UNTIL MASTER_LOG_FILE='db-bin.000247', MASTER_LOG_POS= 97813760;

And there you have it, the 1st insert statement executed correctly

	mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846;
	+------------+---------------+----------+------------+-------+
	| id         | subscriber_id | field_id | content    | mtime |
	+------------+---------------+----------+------------+-------+
	| 1839528432 |      90615540 |    42846 | francois | NULL  | 
	+------------+---------------+----------+------------+-------+
	1 row in set (0.00 sec)

We are now stepping through each commit one at a time... (I'll not copy/paste each one) 

Until now, when the data has changed on the slave:

	mysql> START SLAVE UNTIL MASTER_LOG_FILE='db-bin.000247', MASTER_LOG_POS=97832543;
	Query OK, 0 rows affected (0.00 sec)

	mysql> show slave status \G;
	*************************** 1. row ***************************
	             Slave_IO_State: Waiting for master to send event
	                Master_Host: 10.0.0.26
	                Master_User: *******
	                Master_Port: 3306
	              Connect_Retry: 60
	            Master_Log_File: db-bin.000283
	        Read_Master_Log_Pos: 295414923
	             Relay_Log_File: db-relay-bin.000025
	              Relay_Log_Pos: 97842972
	      Relay_Master_Log_File: db-bin.000247
	           Slave_IO_Running: Yes
	          Slave_SQL_Running: No
	            Replicate_Do_DB: 
	        Replicate_Ignore_DB: 
	         Replicate_Do_Table: 
	     Replicate_Ignore_Table: 
	    Replicate_Wild_Do_Table: 
	Replicate_Wild_Ignore_Table: 
	                 Last_Errno: 0
	                 Last_Error: 
	               Skip_Counter: 0
	        Exec_Master_Log_Pos: 97842835
	            Relay_Log_Space: 38950730252
	            Until_Condition: Master
	             Until_Log_File: db-bin.000247
	              Until_Log_Pos: 97832543
	         Master_SSL_Allowed: No
	         Master_SSL_CA_File: 
	         Master_SSL_CA_Path: 
	            Master_SSL_Cert: 
	          Master_SSL_Cipher: 
	             Master_SSL_Key: 
	      Seconds_Behind_Master: NULL
	1 row in set (0.00 sec)

	ERROR: 
	No query specified

	mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846;
	+------------+---------------+----------+---------+-------+
	| id         | subscriber_id | field_id | content | mtime |
	+------------+---------------+----------+---------+-------+
	| 1839528432 |      90615540 |    42846 | 78633   | NULL  | 
	+------------+---------------+----------+---------+-------+
	1 row in set (0.00 sec)

"francois" has been swapped out for "78633". Just 5 or so commits later in the bin log we see this query:

	# at 97837246
	#100114 11:09:31 server id 13  end_log_pos 97837137     Intvar
	SET INSERT_ID=1839528432/*!*/;
	# at 97837274
	#100114 11:09:31 server id 13  end_log_pos 97837343     Query   thread_id=44827175      exec_time=1     error_code=0
	SET TIMESTAMP=1263485371/*!*/;
	insert into subscriberfield set subscriber_id = '84779527', field_id = '53781', content = '78633' on duplicate key update content = '78633'
	/*!*/;
	# at 97837480
	#100114 11:09:31 server id 13  end_log_pos 97837371     Intvar
	SET INSERT_ID=2133638344/*!*/;
	
	
Notice that subscriber_id & field_id are different, yet the content field bled over (in addition to creating it's own row). If you look at that table:

	CREATE TABLE `subscriberfield` (
	  `id` bigint(20) unsigned NOT NULL auto_increment,
	  `subscriber_id` bigint(20) unsigned NOT NULL,
	  `field_id` int(10) unsigned NOT NULL,
	  `content` text,
	  `mtime` datetime default NULL,
	  PRIMARY KEY  (`id`),
	  UNIQUE KEY `subscriberfield_subscriber_id_field_id_idx` (`subscriber_id`,`field_id`),
	  KEY `field_id` (`field_id`),
	  KEY `subscriberfield_field_id_content_idx` (`field_id`,`content`(10)),
	  CONSTRAINT `subscriberfield_ibfk_1` FOREIGN KEY (`subscriber_id`) REFERENCES `subscriber` (`id`),
	  CONSTRAINT `subscriberfield_ibfk_2` FOREIGN KEY (`field_id`) REFERENCES `field` (`id`)
	) ENGINE=InnoDB AUTO_INCREMENT=1839941935 DEFAULT CHARSET=latin1

There's a UNIQUE KEY on (`subscriber_id`,`field_id`) which should prevent that insert on duplicate key update from affecting a row with a different subscriber & field ID, yet it does.

The problem most likely lies in the primary key. Prior to executing that last query, the binlog has an entry:

	SET INSERT_ID=1839528432/*!*/;

That just so happens to be the primary key of our beloved francois

	mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846;
	+------------+---------------+----------+---------+-------+
	| id         | subscriber_id | field_id | content | mtime |
	+------------+---------------+----------+---------+-------+
	| 1839528432 |      90615540 |    42846 | 78633   | NULL  | 
	+------------+---------------+----------+---------+-------+

Many thanks for writing a bug report.

MySQL 5.0 is out of support since a while.

We need to know if you have same problems with by using MySQL 5.1.

MySQL 5.0 is in support for some customers. I think I have been been encountering this frequently over the past few weeks. I wish this would have been investigated as we have lost a lot of time from this.

Hi Mark,

We never tried it on 5.1 b/c upgrading was going to be a big deal. What we ended up doing was a software hack. We realized that the block of code that was in a transaction didn't absolutely have to be in a transaction so long as we retried the queries when a deadlock occurred. It's a total hack which opened us up to some potential edge cases, but it also solved the replication problem which was more critical at the time. Hope that helps

point the attached testcase to the master server and run. 5.0 slave breaks quickly.

testcase.

Attachment: bug50413.c (text/plain), 6.86 KiB.

repeatable on 5.1.50 when master/slave have --innodb_autoinc_lock_mode=0

http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html
says  "Therefore, if you are using statement-based replication, you must either avoid INSERT ... ON DUPLICATE KEY UPDATE or use innodb_autoinc_lock_mode = 0 (“traditional” lock mode)."

in this case, binlogs are broken and slave breaks in 5.1/5.6

Awesome, so SBR is broken by default for MySQL 5.1 given the default value of innodb_autoinc_lock_mode.

Actually, I don't understand the docs at -- http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html. Paraphrasing some of what has been written there:

>>>
INSERT ... ON DUPLICATE KEY UPDATE is a "mixed-mode" insert
>>>

According to this, I don't have to use innodb_autoinc_lock_mode=0 with SBR.
>>>
innodb_autoinc_lock_mode = 0 (“traditional” lock mode)
...
This lock mode is provided only for backward compatibility and performance testing. There is little reason to use this lock mode unless you use “mixed-mode inserts” and care about the important difference in semantics described later.
>>>

The claim here is that I must use innodb_autoinc_lock_mode=0 for INSERT ...ON DUP KEY UPDATE. That is a bit stronger then the text in the previous section.
>>>
innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)

A similar situation exists if you use INSERT ... ON DUPLICATE KEY UPDATE. This statement is also classified as a “mixed-mode insert” since an auto-increment value is not necessarily generated for each row. Because InnoDB allocates the auto-increment value before the insert is actually attempted, it cannot know whether an inserted value will be a duplicate of an existing value and thus cannot know whether the auto-increment value it generates will be used for a new row. Therefore, if you are using statement-based replication, you must either avoid INSERT ... ON DUPLICATE KEY UPDATE or use innodb_autoinc_lock_mode = 0 (“traditional” lock mode).
>>>

Can someone explain this?

Shane - your testcase shows corruption when innodb_autoinc_lock_mode=0 is used. But the docs state that innodb_autoinc_lock_mode=0 should be used to avoid problems when SBR  is also used.

I care about this because of corruption that occurs when innodb_autoinc_lock_mode=1.

Am I stuck?

I must restate my last comment. Switching to 5.1 has fixed my corruption problem that occurs for INSERT ... ON DUPLICATE KEY UPDATE statements when the binlog has incorrect SET INSERT_ID=... values.

And the switch was done using innodb_autoinc_lock_mode=1, so now I want to know why the docs warn that innodb_autoinc_lock_mode=0 should be used with SBR.

Here is a deterministic test case that reproduces the problem.

======== BEGIN rpl_bug-master.opt ========
--innodb_autoinc_lock_mode=2
======== END rpl_bug-master.opt ========
======== BEGIN rpl_bug.test ========
--source include/master-slave.inc
--source include/have_binlog_format_statement.inc
--source include/have_innodb.inc
CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b INT UNIQUE KEY, c INT) ENGINE=InnoDB;

# Transaction 1 on connection 1
BEGIN;
INSERT INTO t1 SET b=1, c=1 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);
INSERT INTO t1 SET b=1, c=2 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);

  # Transaction 2 on connection 2
  --connection master1
  BEGIN;
  INSERT INTO t1 SET b=2, c=3 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);
  COMMIT;

--connection master
COMMIT;

SELECT * FROM t1;
--sync_slave_with_master
SELECT * FROM t1;
======== END rpl_bug.test ========

The test case causes slave to stop with the following error message:
Error 'Duplicate entry '1' for key 'b'' on query. Default database: 'test'. Query: 'INSERT INTO t1 SET b=1, c=2 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c)'

The test case generates the following binlog:
BEGIN
SET INSERT_ID=2;
INSERT INTO t1 SET b=2, c=3 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c)
COMMIT;
BEGIN
SET INSERT_ID=1;
INSERT INTO t1 SET b=1, c=1 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c)
SET INSERT_ID=2;
INSERT INTO t1 SET b=1, c=2 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c)
COMMIT;

This is what happens on master:
M1. Transaction 1 allocates autoinc value 1 and inserts the first row (1,1,1).
M2. Transaction 1 allocates autoinc value 2 for the second INSERT and tries
    to insert (2,1,2). Since there is a UK on b, the existing row is updated
    and becomes (1,1,2). Note that the allocated autoinc value was not used,
    so the next autoinc value is still 2.
M3. Transaction 2 allocates autoinc value 2 and inserts row (2,2,3).
M4. Transaction 2 commits and is written to the binlog.
M5. Transaction 1 commits, hence is written to the binlog after
    Transaction 1.

When the slave replays this binlog, the following happens:
S1. Transaction 2 inserts (2,2,3).
S2. Transaction 1 inserts row (1,1,1).
S3. Transaction 1 tries to insert row (2,1,2). Now this row clashes with
    *both* the PK on a and the UK on b. Which one it chooses to update is
    nondeterministic, but it happens to choose (2,2,3). Thus, it tries to
    replace (2,2,3) by (2,1,2). This fails because of the UK on b.

Correction: rpl_bug-master.opt should contain 
--innodb_autoinc_lock_mode=0

Also, the test case can be simplified: it's enough to have one single-statement transaction and a concurrent autocommited statement, as follows:

--source include/master-slave.inc
--source include/have_binlog_format_statement.inc
--source include/have_innodb.inc
CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b INT UNIQUE KEY, c INT) ENGINE=InnoDB;

INSERT INTO t1 SET b=1, c=1;

BEGIN;
INSERT INTO t1 SET b=1, c=2 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);

  --connection master1
  INSERT INTO t1 SET b=2, c=3 ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);

--connection master
COMMIT;

SELECT * FROM t1;
--sync_slave_with_master
SELECT * FROM t1;

In fact, it seems the reason for this bug is not innodb or autoincrement: it happens with concurrent transactions and INSERT ON DUPLICATE KEY UPDATE statements. Here is an even simpler test case (independent of innodb_autoinc_lock_mode):

--source include/master-slave.inc
--source include/have_binlog_format_statement.inc
--source include/have_innodb.inc
CREATE TABLE t1 (a INT PRIMARY KEY, b INT UNIQUE KEY, c INT) ENGINE = InnoDB;
INSERT INTO t1 VALUES (1, 1, 1);
BEGIN;
INSERT INTO t1 VALUES (2, 1, 2) ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);
  --connection master1
  INSERT INTO t1 VALUES(2, 2, 3) ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c);
--connection master
COMMIT;
SELECT * FROM t1;
--sync_slave_with_master
SELECT * FROM t1;

I don't think there is any way to solve this with statement-based replication. So the solution is to mark INSERT..ON DUPLICATE KEY IGNORE unsafe. That will cause a warning if binlog_format=statement and switch to row logging when binlog_format=row.

Maybe it is enough to mark INSERT..ON DUPLICATE KEY IGNORE unsafe when it is inside a transaction. I'm not 100% sure about that, it would need to be investigated more carefully.

Actually, there are two bugs here:

1. The primary problem is an innodb bug (contrary to what I said in my
   last comment). When the following statement failed to find an entry
   for 2 in column a, it should have taken a gap lock preventing
   concurrent clients from adding a row with value 2:

     INSERT INTO t1 VALUES (2, 1, 2)
            ON DUPLICATE KEY UPDATE b=VALUES(b) c=VALUES(c)   

   In general, if there is more than one unique or primary key, and
   INSERT ... ON DUPLICATE KEY UPDATE does not find any value for one
   of the keys, it should take a gap lock on that key and hold it
   for the duration of the transaction. This is not only a replication
   problem, but also a transaction isolation problem.

2. There is also a related replication bug. I reported a separate bug
   for that: see BUG#58637.

The original bug would be solved by either of these fixes (if the reporter switched to binlog_format=mixed). However, the reason that it failed in the original report was problem 1. So we will keep this bug for problem 1 (the innodb problem), and use BUG#58637 to track problem 2.

The following test case shows that the interleaved transactions execute in a non-serializable order.

--source include/have_innodb.inc
--connect (other,127.0.0.1,root,,test,$MASTER_MYPORT,)
--connection default

CREATE TABLE t1 (a INT UNIQUE KEY, b INT UNIQUE KEY, c INT) ENGINE = InnoDB;

--echo # Interleaved transactions
INSERT INTO t1 VALUES (1, 1, 1);
BEGIN;
INSERT INTO t1 VALUES (2, 1, 2) ON DUPLICATE KEY UPDATE c=VALUES(c);
  --connection other
  BEGIN;
  INSERT INTO t1 VALUES (2, 2, 3) ON DUPLICATE KEY UPDATE c=VALUES(c);
  COMMIT;
--connection default
INSERT INTO t1 VALUES (2, 3, 4) ON DUPLICATE KEY UPDATE c=VALUES(c);
COMMIT;
SELECT * FROM t1;

--echo # Serialized execution order 1
DELETE FROM t1;
INSERT INTO t1 VALUES (1, 1, 1);
BEGIN;
INSERT INTO t1 VALUES (2, 1, 2) ON DUPLICATE KEY UPDATE c=VALUES(c);
INSERT INTO t1 VALUES (2, 3, 4) ON DUPLICATE KEY UPDATE c=VALUES(c);
COMMIT;
BEGIN;
INSERT INTO t1 VALUES (2, 2, 3) ON DUPLICATE KEY UPDATE c=VALUES(c);
COMMIT;
SELECT * FROM t1;

--echo # Serialized execution order 2
DELETE FROM t1;
INSERT INTO t1 VALUES (1, 1, 1);
BEGIN;
INSERT INTO t1 VALUES (2, 2, 3) ON DUPLICATE KEY UPDATE c=VALUES(c);
COMMIT;
BEGIN;
INSERT INTO t1 VALUES (2, 1, 2) ON DUPLICATE KEY UPDATE c=VALUES(c);
INSERT INTO t1 VALUES (2, 3, 4) ON DUPLICATE KEY UPDATE c=VALUES(c);
COMMIT;
SELECT * FROM t1;

DROP TABLE t1;

Btw, Mark is right: there is no replication problem specific to innodb_autoinc_lock_mode=1. It is as safe as innodb_autoinc_lock_mode=0. We should remove that warning from the documentation.

Btw, Mark is right: there is no replication problem specific to innodb_autoinc_lock_mode=1. It is as safe as innodb_autoinc_lock_mode=0. We should remove that warning from the documentation.

Given that this bug occurs with innodb_autoinc_lock_mode=0 but not when set to 1, I think that 1 is safer than 0.

It was added to the documentation due to this: http://bugs.mysql.com/bug.php?id=28781 people were complaining about gaps in autoincrement values which is normal behavior when innodb_autoinc_lock_mode=1, setting it to 0 uses the old lock mode and avoids gaps when using ON DUPLICATE KEY UPDATE. 

MyISAM version of this issue http://bugs.mysql.com/bug.php?id=24432 for 5.0.24 which they apparently fixed in 5.0.38 so autoinc gaps no longer occur, of course this COMPLETELY breaks replication between 5.0.24 and anything newer(so much for minor release changes).

Any plan to fix thi? this has been finger pointed by bug #52020 and im getting deadlocks on a simple insert on duplicate key update left and right.

Fixed as of 5.7.3, and here's the changelog entry:

An "INSERT ...ON DUPLICATE KEY UPDATE" statement run on a table with
multiple unique indexes would sometimes cause events to be written to the
binary log incorrectly.

Thank you for the bug report.

Revision to changelog entry:

"Concurrent "INSERT ...ON DUPLICATE KEY UPDATE" statements run on a table
with multiple unique indexes would sometimes cause events to be written to
the binary log incorrectly"

Reopening this bug. The patch has been reverted due to regressions. The 5.7.3 changelog entry has been removed.

The changelog entry noted previously has been added to the 5.7.4 changelog. 

Thank you for the bug report.

Posted by developer:
 
 The fix for this bug was reverted by by the fix for Bug #25966845 INSERT ON DUPLICATE KEY GENERATE A DEADLOCK in 5,7,26 and 8.0.15.