Bug #38205 | Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUPP_KEY | ||
---|---|---|---|
Submitted: | 17 Jul 2008 16:44 | Modified: | 6 May 2009 14:47 |
Reporter: | Eric Jensen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.25 | OS: | Linux (x86_64) |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
Tags: | RBR, replication |
[17 Jul 2008 16:44]
Eric Jensen
[17 Jul 2008 16:59]
Eric Jensen
The last bullet on http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html also seems to also suggest a problem with RBR on non-transactional tables to me, but I find it hard to understand...I think some elaboration is in order.
[21 Jul 2008 11:42]
Sveta Smirnova
Thank you for the report. Please provide output of SHOW CREATE TABLE problem_table and configuration files for both master and slave. Also please indicate if slave is read-only.
[21 Jul 2008 22:28]
Eric Jensen
CREATE TABLE `problem_table` ( `a_id` int(10) unsigned NOT NULL, `b_id` int(10) unsigned NOT NULL, PRIMARY KEY (`a_id`,`b_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8
[21 Jul 2008 22:38]
Eric Jensen
slave does not have read-only set, but it is not being written to by anything other than the replication
[22 Jul 2008 17:45]
Sveta Smirnova
Thank you for the feedback. I can not repeat the problem with test data. Could you please also send us binary log binlog.000003?
[22 Jul 2008 23:25]
Eric Jensen
I don't have the actual binlog still sitting around. But it wasn't one particular incident. Anytime I stopped and started the slave it would break with this error.
[23 Jul 2008 10:28]
Susanne Ebrecht
Many thanks for writing a bug report. Did you only stop/start the slave or did you restart the server as well?
[23 Jul 2008 16:56]
Eric Jensen
I both stopped/started the slave and restarted the server. Either of them would break it.
[24 Jul 2008 19:31]
Sveta Smirnova
Thank you for the feedback. Problem is I could not repeat described behavior with test data. So example of SHOW CREATE TABLE, query against it and binary log error happens with would be appreciated. Easiest way to get such binary log: 1. On master: FLUSH LOGS 2. On master: issue some query which could break replication 3. On slave: STOP SLAVE; START SLAVE; 4. Check if replication is broken, then send us new small binary log. But if you can get sequence of queries on master which lead to such behavior it can be enough.
[4 Aug 2008 13:07]
Susanne Ebrecht
Many thanks for writing a bug report. This is a known issue for replication of MyISAM tables. http://dev.mysql.com/doc/refman/5.0/en/replication-features-slaveshutdown.html According to our documentation, you can give this workaround a try: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication-issues.html SET GLOBAL SLAVE_EXEC_MODE = 'IDEMPOTENT'; Usually, this should work with MyISAM too. Please let us know, if the workaround helps to shrink your problems with this duplicate key errors.
[4 Aug 2008 16:33]
Eric Jensen
Thanks for the tip. I have not tested it yet, but the documentation for SLAVE_EXEC_MODE = 'IDEMPOTENT' says "this mode should be employed in multi-master replication, circular replication, and some other special replication scenarios". I am not doing anything special, just replicating some inserts (no updates or deletes) into a two-column primary key. Is just using MyISAM with row-based replication at all an edge case that requires IDEMPOTENT?
[5 Aug 2008 10:03]
Susanne Ebrecht
I will set your bug report as feature request. The feature request is to stop this "annoying" duplicate key errors on the slave for MyISAM after restarting the slave.
[5 Aug 2008 14:04]
Eric Jensen
At the very least this is a documentation bug. If the default replication (MIXED) breaks with the default storage engine (MyISAM), most people will view this as a bug. It seems like if you need IDEMPOTENT to be able to stop and start replication at all in the default set up, then IDEMPOTENT should also be the default. Either that, or the documentation should say in big red letters that you must turn on IDEMPOTENT for replication to work with MyISAM tables.
[24 Sep 2008 19:41]
Philip Stoev
I am able to hit this repeatedly -- unless --slave-exec-mode=Idempotent is used, issuing START/STOP slave with MyISAM causes row-based replication to break with a duplicate/key or row not found error. This happens even on single-threaded workloads with extremely simple queries and does not require crashing the slave.
[28 Sep 2008 15:56]
Lars Thalmann
This should work in the default mode, so this is a bug. More specifically, the slave should stop in the correct position so that log is not re-executed. The idempotent mode removes some error conditions so that a log can be re-executed. Idempotent = you can apply the log twice with the same result as applying it once. It is only recommended in setups where the logs need to be re-executed in case of failures. Normally it should not be needed.
[28 Nov 2008 6:46]
Eric Jensen
What is the status of this in the 5.1.30 GA? Can I use RBR with MyISAM without enabling the dangerous replication mode?
[29 Nov 2008 15:02]
Mark Callaghan
Can we get more detail on this? On the bright side, I think the problem is that events are executed twice rather than skipped. Will this always lead to an error? Will this ever lead to a case where the slave data and master data don't match?
[29 Nov 2008 16:53]
Eric Jensen
I would think that if it was a table without a unique key, the repeated execution would lead to inconsistency, although I haven't tested that case. It seems like a fairly major bug to me that replication re-executes statements due to a normal start/stop slave on the default table format (myisam).
[9 Dec 2008 21:56]
Mats Kindahl
The behavior of STOP SLAVE in 5.0 was that the slave stops immediately after execution of the current statement, even if inside a transaction. For transactional tables, this is not a problem since in that case, the transaction will be rolled back, but for non-transactional changes, this is disastrous. This behavior have been inherited by 5.1. Note that the same problem will occur when using statement-based replication in both 5.0 and 5.1. As a resolution to this bug, we will change the semantics of STOP SLAVE so that the currently executing group (that is, transaction or single statement outside a transaction) will finish execution and STOP SLAVE will not return until the SQL thread has stopped executing the transaction (it will, of course, not wait for the SQL thread when a STOP SLAVE IO_THREAD is issued). This will preserve the current slave_exec_mode default as STRICT in order to catch "real" duplicate key errors, while still retaining backward compatibility with any scripts that expect the slave to have stopped after issuing a STOP SLAVE.
[10 Dec 2008 0:49]
Mark Callaghan
Thanks Mats. I can endorse RBR now, but we don't really use 5.1 yet.
[10 Dec 2008 18:21]
Andrei Elkin
The current is not a Row-Based-Replication issue. As the referred page http://dev.mysql.com/doc/refman/5.0/en/replication-features-slaveshutdown.html claims (although not explains) MyISAM is not shutdown-proof. SLAVE STOP essentially kills the current executing event, should it be Query or Row one. In the case of MyISAM table and a *query* like UPDATE, DELETE the being killed query can be interrupted in the middle, as on the master so on the slave where an event can be in either the Query or the Row format (INSERT is perculiar in this regard, its multi-rows work is kill-proof). With the event of either type on Innodb table there is no issue because interruption with STOP SLAVE leads to rollback of the transaction the event is part of. Idempotent can not be a default because it would release from checking of existence on the slave of the before image of the row-event. I dare to say the current is not really a bug, not for RBR at least. Although we realize STOP SLAVE can be made more soft and intelligent.
[10 Dec 2008 19:21]
Andrei Elkin
After speaking to Support we came to consensus: the current is a bug in too course implementation of STOP SLAVE to allow interruption of a an event (incl. a plain query!) in the middle.
[10 Dec 2008 23:14]
Andrei Elkin
Correcting my early comments on that STOP SLAVE can interrupt a query event. This is not possible. Indeed, STOP SLAVE does not work through kill-query-interface but rather flags the state of the SQL thread a way to stop it only at the end of the being currently handled event. The can be an internal event for associated with the query event, which is always the last in such group, but there is no harm to execute the internal and stop at the query event itself. So the current artifact strongly affects the row-based replication. Still, there is a possibility for this artifact to happen the Statement format when a transaction on the master modifies both transaction and non-transaction tables. Such a mixed group of queries if interrupted on slave will restart from the beginning of the transaction, but the non-transaction tables might be left modified by the first attempt of execution of the transaction.
[26 Feb 2009 23:11]
James Day
See also bug #43217 which now appears very like a duplicate of this one. Bug #43217 is a fork of part of bug #319. Should comment on bug #43217 and close as duplicate if this bug fix will fix that problem.
[23 Mar 2009 14:45]
Susanne Ebrecht
Bug #43686 is set as duplicate of this bug here.
[26 Mar 2009 8:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/70478 2836 Andrei Elkin 2009-03-26 Bug#38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUP Bug#319 if while a non-transactional slave is replicating a transaction possible problem It is impossible to roll back a mixed engines transaction when one of the engine is non-transaction. In replication that fact is crucial because the slave can not safely re-apply a transction that was interrupted with STOP SLAVE. Fixed with making STOP SLAVE not be effective immediately in the case the current group of replication events has modified a non-transaction table. In order for slave to leave either the group needs finishing or the user issues KILL QUERY|CONNECTION slave_thread_id.
[9 Apr 2009 13:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71784 2837 Andrei Elkin 2009-04-09 Bug #38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUPP_KEY Bug#319 if while a non-transactional slave is replicating a transaction possible problem only testing related: addressing reviewers' comments.
[21 Apr 2009 8:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/72545 2861 Andrei Elkin 2009-04-21 [merge] merge bug#38205 fixes to 5.1-bt
[21 Apr 2009 9:54]
Andrei Elkin
Pushed to 5.1+ -bugteam.
[5 May 2009 19:43]
Bugs System
Pushed into 5.1.35 (revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (version source revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (merge vers: 5.1.35) (pib:6)
[6 May 2009 9:58]
Jon Stephens
Documentation: See Bug #319. Set status NDI: waiting for 6.0 merge.
[6 May 2009 14:07]
Bugs System
Pushed into 6.0.12-alpha (revid:svoj@sun.com-20090506125450-yokcmvqf2g7jhujq) (version source revid:aelkin@mysql.com-20090421094803-hho7a34gdsby7382) (merge vers: 6.0.11-alpha) (pib:6)
[6 May 2009 14:47]
Jon Stephens
Also documented in the 6.0.12 changelog (see Bug #319). Closed.
[15 Jun 2009 8:29]
Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090615074202-0r5r2jmi83tww6sf) (version source revid:jonas@mysql.com-20090615070837-9pccutgc7repvb4d) (merge vers: 5.1.35-ndb-6.3.26) (pib:6)
[15 Jun 2009 9:09]
Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090615074335-9hcltksp5cu5fucn) (version source revid:jonas@mysql.com-20090615072714-rmfkvrbbipd9r32c) (merge vers: 5.1.35-ndb-7.0.7) (pib:6)
[15 Jun 2009 9:50]
Bugs System
Pushed into 5.1.35-ndb-6.2.19 (revid:jonas@mysql.com-20090615061520-sq7ds4yw299ggugm) (version source revid:jonas@mysql.com-20090615054654-ebgpz7elwu1xj36j) (merge vers: 5.1.35-ndb-6.2.19) (pib:6)
[23 Jul 2009 10:24]
Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090723102221-ps4uaphwbxzj8p0q) (version source revid:aelkin@mysql.com-20090708180952-73h6cpmg3r5wtxwa) (merge vers: 5.4.4-alpha) (pib:11)
[12 Aug 2009 21:56]
Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[12 Aug 2009 21:58]
Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[14 Aug 2009 22:48]
Paul DuBois
Ignore previous comment about 5.4.2.
[7 Oct 2009 1:32]
Paul DuBois
The 5.4 fix now has been pushed into 5.4.2.