Bug #34768 nondeterministic INSERT using LIMIT logged in stmt mode if binlog_format=mixed
Submitted: 22 Feb 2008 21:19 Modified: 29 Mar 2008 18:24
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1, 6.0 BK OS:Any
Assigned to: Sergey Vojtovich CPU Architecture:Any
Tags: binlog, limit, mixed mode, nondeterministic statement, replication, unsafe statement

[22 Feb 2008 21:19] Sven Sandberg
Description:
When the binlog uses mixed mode, it chooses to log statement-based if the statement is deterministic, and row-based if the statement is non-deterministic. Non-deterministic statements are, e.g., INSERT DELAYED, or statements that refer to a @@variable. See, e.g., WL#3339, BUG#20930, and BUG#31168.

However, statements that use LIMIT are unsafe (because the order in which rows are retrieved is not defined), but they are logged in statement mode if binlog_format=mixed.

See also http://www.xaprb.com/blog/2007/11/08/how-mysql-replication-got-out-of-sync/

How to repeat:
Run the following test case:

source include/have_log_bin.inc;
source include/have_binlog_format_mixed.inc;
create table t1 (a int);
insert delayed into t1 values (1), (2), (3);
insert into t1 select * from t1 limit 1;

Then run:

mysqlbinlog var/log/master-bin.000001

Suggested fix:
Mark all statements using LIMIT as unsafe (we can make an exception for statements that have an ORDER BY which includes a primary key).
[22 Feb 2008 21:38] Sveta Smirnova
Thank you for the report.

Verified as described.
[25 Feb 2008 19:51] Timothy Smith
Partial Workaround: add ORDER BY clause to INSERT ... SELECT statements.
[27 Feb 2008 17:53] Sven Sandberg
There is another related bug (perhaps less easy to detect for users), which we need to think about when fixing this:

Assume we execute a SELECT where a function is executed for each retrieved row. If the function has side effects, then the order of those side effects is undefined, so the statement is not safe to log.

For example,

CREATE TABLE t1 (a INT);
CREATE TABLE t2 (a INT, b INT KEY AUTO_INCREMENT)

CREATE FUNCTION f(x int)
RETURNS INT
INSERT INTO t2(a) VALUES (x);

INSERT INTO t1 VALUES (77), (88);
SELECT f(a) FROM t1;

Depending on the order rows are retrieved in the select, t2 will contain either
77, 1
88, 2
or
88, 1
77, 2

Similar situations can be constructed with triggers, etc. We can do this also without functions:

CREATE TABLE t1 (a INT);
CREATE TABLE t2 (a INT, b INT KEY AUTO_INCREMENT);
INSERT INTO t1 VALUES (77), (88);
INSERT INTO t2(a) SELECT * FROM t1;

Note that this may be difficult to implement. E.g., we *don't* want the following to give a warning, since f is only called once so nothing is nondeterministic:

SELECT f(1);

The following is a border case:

SELECT f(1) FROM t1;

Here, the order in which f is called is nondeterministic and f has side effects, but it does not matter since the two calls of f can be permuted without changing anything.

The following is safe, but probably too difficult to detect as safe:

SELECT f(a) FROM t1; # where column a of t1 is constant across all rows

In general, I think we should over-approximate, and mark statements as unsafe in all cases when we are not sure (i.e., when mysql is not sure).

I think a fundamental question to answer is whether we should detect unsafe-ness on a syntactic level by analyzing the query beforehand, or if we should detect it on-the-fly while executing the query. A syntactic way to do it could be, e.g., to identify components of the language which have side effects (such as table updates, and functions which do things that possibly have side effects, etc), and identify queries which execute such components in an undefined order, and mark them as unsafe. An on-the-fly way to do it could be, e.g., to record when side effects happen; record when two things happen in an undefined order; and mark a statement as unsafe when we detect that two things which happen in an undefined order have side effects.
[13 Mar 2008 8:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43901

ChangeSet@1.2558, 2008-03-13 12:47:24+04:00, svoj@mysql.com +6 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode if
              binlog_format=mixed
  
  Statement-based replication of DELETE ... LIMIT, UPDATE ... LIMIT,
  INSERT ... SELECT ... LIMIT is not safe as order of rows is not
  defined.
  
  With this fix, we issue a warning that this statement is not safe to
  replicate in statement mode, or go to row-based mode in mixed mode.
[18 Mar 2008 9:55] Kristian Nielsen
A similar problem can occur with a simple multirow UPDATE of a column with a unique constraint. Such a statement can succeed or fail depending on the order in which rows are updated, if the after value of one row conflicts with the before value of another row.

In this case, if the update order differs on master and slave, it is possible for the query to succeed on the master, but fail with constraint violation on the slave.

Arguably, this is an error in the application, as the update on the master succeeded only by sheer luck. So I am not sure if it should be fixed by going to mixed mode?

Alternatively it might be worthwhile to mention this problem explicitly on http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html (or similar place) as a source of non-deterministic UPDATE that may fail in statement-based mode.

Test case showing this. This will fail in statement-based mode, but succeed in row-based mode.

--source include/master-slave.inc
--source include/not_embedded.inc

connection master;
create table t1 (a int primary key, b int unique) engine=myisam;
insert into t1 values (1,2), (2,1);
sync_slave_with_master;
# Re-write table on slave with same data, but different insert order.
# This emulates possible different SELECT order on master and slave
truncate t1;
insert into t1 values (2,1), (1,2);
# Update on master; this update fails or not depending on actual order of update
connection master;
update t1 set b=b+1;
select * from t1 order by a;

# This fails in mixed (or statement) based replication due to different update
# order on slave. Works in row-based replication.
sync_slave_with_master;
select * from t1 order by a;

connection master;
drop table t1;
sync_slave_with_master;
[18 Mar 2008 15:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44178

ChangeSet@1.2558, 2008-03-18 19:30:15+04:00, svoj@mysql.com +6 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode if
              binlog_format=mixed
  
  Statement-based replication of DELETE ... LIMIT, UPDATE ... LIMIT,
  INSERT ... SELECT ... LIMIT is not safe as order of rows is not
  defined.
  
  With this fix, we issue a warning that this statement is not safe to
  replicate in statement mode, or go to row-based mode in mixed mode.
  
  Note that we may consider a statement as safe if ORDER BY primary_key
  is present. However it may confuse users to see very similiar statements
  replicated differently.
[18 Mar 2008 16:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44181

ChangeSet@1.2558, 2008-03-18 20:25:34+04:00, svoj@mysql.com +6 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode if
              binlog_format=mixed
  
  Statement-based replication of DELETE ... LIMIT, UPDATE ... LIMIT,
  INSERT ... SELECT ... LIMIT is not safe as order of rows is not
  defined.
  
  With this fix, we issue a warning that this statement is not safe to
  replicate in statement mode, or go to row-based mode in mixed mode.
  
  Note that we may consider a statement as safe if ORDER BY primary_key
  is present. However it may confuse users to see very similiar statements
  replicated differently.
  
  Note 2: regular UPDATE statement (w/o LIMIT) is unsafe as well, but
  this patch doesn't address this issue. See comment from Kristian
  posted 18 Mar 10:55.
[25 Mar 2008 13:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44389

ChangeSet@1.2571, 2008-03-25 17:28:12+04:00, svoj@mysql.com +4 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode
              if binlog_format=mixed
  
  Addition to fix for BUG#34768: fixed test case failures discovered
  by pushbuild.
[27 Mar 2008 11:18] Bugs System
Pushed into 5.1.24-rc
[27 Mar 2008 17:50] Bugs System
Pushed into 6.0.5-alpha
[29 Mar 2008 17:53] Jon Stephens
Documented bugfix in the 5.1.24 and 6.0.5 changelogs as follows:

        When using statement-based replication and a DELETE, UPDATE, or
        INSERT ... SELECT statement using a LIMIT clause is encountered, a warning
        that the statement is not safe to replicate in statement mode is now
        issued; when using MIXED mode, the statement is now
        replicated using the row-based format.
        
        Exception. The statement is considered safe for statement-based 
        replication if it uses ORDER BY primary_key.

Also added new information about the effects of LIMIT clause on these statements to replication-features section of 5.1/6.0 Manual.
[29 Mar 2008 18:16] Sergey Vojtovich
Jon, sorry for confusion, but the exception is not valid. If LIMIT clause is specified, we always go mixed mode or issue a warning. In other words, we the statement is not considered safe if it uses ORDER BY primary_key.
[29 Mar 2008 18:24] Jon Stephens
Removed erroneous description of 'exception' from changelog and other docs per Svoj's comment above.
[2 Apr 2008 20:29] Jon Stephens
Also noted in the 5.1.23-ndb-6.3.11 changelog.