Bug #41263 Replication breaks after 'INSERT ... ON DUPLICATE UPDATE ...'
Submitted: 5 Dec 2008 16:18 Modified: 23 Mar 16:23
Reporter: Stanisław Pitucha
Status: Duplicate
Category:Server: Replication Severity:S2 (Serious)
Version:5.1.30 OS:Linux
Assigned to: Target Version:
Tags: replication, insert on duplicate update, stop

[5 Dec 2008 16:18] Stanisław Pitucha
Description:
Issue exists in 5.0.45 too. Might be related to bug#17684. If you don't think removing
auto_increment column is a valid workaround, severity might as well be S1.

If load is high enough, replication stops after an 'INSERT ... ON DUPLICATE UPDATE ...'
with error:

081204 22:49:27 [ERROR] Slave SQL: Error 'Duplicate entry 'XXXX' for key 'ab_idx'' on
query. Default database: 'openser'. Query: 'XXXXXXXXXXXXXXXXXXXXXXXX', Error_code: 1062
081204 22:49:27 [Warning] Slave: Duplicate entry 'XXXX' for key 'registration_idx'
Error_code: 1062
081204 22:49:27 [ERROR] Error running query, slave SQL thread aborted. Fix the p
roblem, and restart the slave SQL thread with "SLAVE START". We stopped at log '
mysql-bin.000001' position 68178

`show slave status` still displays 'Waiting for master to send event' - just with the
Last_error changed.
Error stops occuring after removing the auto_increment column in that table.

Original schema where it was first observed:
CREATE TABLE `location` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(64) NOT NULL DEFAULT '',
  `domain` varchar(64) DEFAULT NULL,
  `contact` varchar(255) NOT NULL DEFAULT '',
  `received` varchar(128) DEFAULT NULL,
  `path` varchar(128) DEFAULT NULL,
  `expires` datetime NOT NULL DEFAULT '2020-05-28 21:32:15',
  `q` float(10,2) NOT NULL DEFAULT '1.00',
  `callid` varchar(255) NOT NULL DEFAULT 'Default-Call-ID',
  `cseq` int(11) NOT NULL DEFAULT '13',
  `last_modified` datetime NOT NULL DEFAULT '1900-01-01 00:00:01',
  `flags` int(11) NOT NULL DEFAULT '0',
  `cflags` int(11) NOT NULL DEFAULT '0',
  `user_agent` varchar(255) NOT NULL DEFAULT '',
  `socket` varchar(64) DEFAULT NULL,
  `methods` int(11) DEFAULT NULL,
  UNIQUE KEY `registration_idx` (`username`,`contact`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1

How to repeat:
Create a MyISAM table t with:
- auto_increment primary key `id`
- unique index `ab_idx` on two columns `a` and `b` (varchar(255))
- varchar(255) `c`

Replicate the table in master-master scheme.

Generate lots of queries (issue seems to be load-dependant - >10 queries/sec cause it)
like:
INSERT INTO t (a, b, c) VALUES ('some', 'long', 'strings') ON DUPLICATE UPDATE
c='strings';
With `a` and `b` values that repeat every 10 or so query.

One of the hosts stops replication as described above after couple of seconds.
[8 Dec 2008 8:48] Sveta Smirnova
Thank you for the report.

Please provide configuration files for both servers. Particularly I want to check if you
have set auto_increment_offset and/or auto_increment_increment values.
[8 Dec 2008 12:26] Stanisław Pitucha
Yes. It's a master-master replication, so one server has:

server-id = 1
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 1

and the other:

server-id = 2
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 2
[8 Dec 2008 13:19] Stanisław Pitucha
Just to make the answer complete:

--- >8 --- config:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql

server-id = 1
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 1

master-host = XXX
master-user = slave-vnm2
master-password = YYY
master-connect-retry = 60

log-bin = /var/lib/mysql/mysql-bin.log
log-slave-updates

relay-log = /var/lib/mysql/slave-relay.log
relay-log-index = /var/lib/mysql/slave-relay-log.index

expire_logs_days = 7
max_binlog_size = 500M

set-variable=long_query_time=3
log-slow-queries = /var/lib/mysql/slow-queries.log

slave_exec_mode = IDEMPOTENT
binlog_format = STATEMENT

# tuning
skip-innodb

key_buffer_size = 64M
thread_cache_size = 4
table_cache = 80
query_cache_size = 0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
[9 Dec 2008 17:46] Sveta Smirnova
Thank you for the feedback.

I can not repeat described behavior with generic test. Do you have triggers on table
location?
[9 Dec 2008 20:55] Stanisław Pitucha
No triggers there. There are only 3 queries running on that table:
INSERT INTO location ... ON DUPLICATE UPDATE ...; (avg. every 2 minutes per user,
updating all fields, apart from id)
SELECT ... FROM location WHERE username = ... AND domain = ...; (rare)
DELETE ALL FROM location WHERE expires < NOW(); (once every 20 secs)

On a test system with 10 users, I couldn't get the replication to crash for a couple of
days. After putting all other users on the system (around 900), it crashes before binlog
reaches 2MB.

The frequency of INSERTs is varied though - for each user it's between 10 seconds and 1
hour - average should be ~2 min. I'm not sure what kind of information would be helpful
here. I could send you a config to a daemon which causes that behaviour, but I don't want
to post the binlog itself, unless there's a way to remove some data (phone numbers +
passwords).
[15 Jan 14:21] Susanne Ebrecht
Sorry, I am totally confused here. You wrote it happens on version 5.0.45 but when I look
into the header there version 5.1.30 is given.

Which version did you have this issue?

Do you have this issue with actual 5.1 (5.1.30) version too?
[15 Jan 16:00] Stanisław Pitucha
Sorry for the confusion.
It happens on:
- 5.0.45 (centos package)
- some 5.1.XX version with XX around 20, but I can't remember the exact number there (rpm
on mysql site)
- 5.1.30 (rpm from the mysql site)

The attached error message is from 5.1.30, but the previous versions had the same reason
for failing (not sure if the same error_code).
[16 Jan 17:45] Susanne Ebrecht
This could a well known effect of statement based replication. Please switch to row based
replication and let us know if you still get this behaviour. Row based replication is
possible with MySQL 5.1. Of course we recommend to use actual GA version here (5.1.30).
[17 Feb 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[23 Feb 13:51] Stanisław Pitucha
I tried switching to RBR and it worked fine for the last 2 weeks.

But I'm not sure it's "a well known effect of statement based replication.". I looked for
it before and browsed the documentation again just now and found only the "For complex
statements, the statement must be evaluated and executed on the slave...." point on
http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html But it's not included in
the "Statements which are unsafe for SBR." point.

Maybe something like "INSERT ... ON DUPLICATE ..." should be mentioned there as a
specific example of a query unsafe for SBR?
[9 Mar 5:47] Roel Van de Paar
Notice bug http://bugs.mysql.com/bug.php?id=28781 which looks very similar.
[23 Mar 16:23] Susanne Ebrecht
This is a duplicate of bug #28781.