Bug #34768 nondeterministic INSERT using LIMIT logged in stmt mode if binlog_format=mixed
Submitted: 22 Feb 2008 22:19 Modified: 29 Mar 2008 19:24
Reporter: Sven Sandberg
Status: Closed
Category:Server: Replication Severity:S2 (Serious)
Version:5.1, 6.0 BK OS:Any
Assigned to: Sergey Vojtovich Target Version:5.1.24
Tags: replication, binlog, limit, mixed mode, unsafe statement, nondeterministic statement
Triage: D2 (Serious)

[22 Feb 2008 22:19] Sven Sandberg
Description:
When the binlog uses mixed mode, it chooses to log statement-based if the statement is
deterministic, and row-based if the statement is non-deterministic. Non-deterministic
statements are, e.g., INSERT DELAYED, or statements that refer to a @@variable. See,
e.g., WL#3339, BUG#20930, and BUG#31168.

However, statements that use LIMIT are unsafe (because the order in which rows are
retrieved is not defined), but they are logged in statement mode if binlog_format=mixed.

See also http://www.xaprb.com/blog/2007/11/08/how-mysql-replication-got-out-of-sync/

How to repeat:
Run the following test case:

source include/have_log_bin.inc;
source include/have_binlog_format_mixed.inc;
create table t1 (a int);
insert delayed into t1 values (1), (2), (3);
insert into t1 select * from t1 limit 1;

Then run:

mysqlbinlog var/log/master-bin.000001

Suggested fix:
Mark all statements using LIMIT as unsafe (we can make an exception for statements that
have an ORDER BY which includes a primary key).
[22 Feb 2008 22:38] Sveta Smirnova
Thank you for the report.

Verified as described.
[25 Feb 2008 20:51] Timothy Smith
Partial Workaround: add ORDER BY clause to INSERT ... SELECT statements.
[27 Feb 2008 18:53] Sven Sandberg
There is another related bug (perhaps less easy to detect for users), which we need to
think about when fixing this:

Assume we execute a SELECT where a function is executed for each retrieved row. If the
function has side effects, then the order of those side effects is undefined, so the
statement is not safe to log.

For example,

CREATE TABLE t1 (a INT);
CREATE TABLE t2 (a INT, b INT KEY AUTO_INCREMENT)

CREATE FUNCTION f(x int)
RETURNS INT
INSERT INTO t2(a) VALUES (x);

INSERT INTO t1 VALUES (77), (88);
SELECT f(a) FROM t1;

Depending on the order rows are retrieved in the select, t2 will contain either
77, 1
88, 2
or
88, 1
77, 2

Similar situations can be constructed with triggers, etc. We can do this also without
functions:

CREATE TABLE t1 (a INT);
CREATE TABLE t2 (a INT, b INT KEY AUTO_INCREMENT);
INSERT INTO t1 VALUES (77), (88);
INSERT INTO t2(a) SELECT * FROM t1;

Note that this may be difficult to implement. E.g., we *don't* want the following to give
a warning, since f is only called once so nothing is nondeterministic:

SELECT f(1);

The following is a border case:

SELECT f(1) FROM t1;

Here, the order in which f is called is nondeterministic and f has side effects, but it
does not matter since the two calls of f can be permuted without changing anything.

The following is safe, but probably too difficult to detect as safe:

SELECT f(a) FROM t1; # where column a of t1 is constant across all rows

In general, I think we should over-approximate, and mark statements as unsafe in all
cases when we are not sure (i.e., when mysql is not sure).

I think a fundamental question to answer is whether we should detect unsafe-ness on a
syntactic level by analyzing the query beforehand, or if we should detect it on-the-fly
while executing the query. A syntactic way to do it could be, e.g., to identify
components of the language which have side effects (such as table updates, and functions
which do things that possibly have side effects, etc), and identify queries which execute
such components in an undefined order, and mark them as unsafe. An on-the-fly way to do it
could be, e.g., to record when side effects happen; record when two things happen in an
undefined order; and mark a statement as unsafe when we detect that two things which
happen in an undefined order have side effects.
[13 Mar 2008 9:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43901

ChangeSet@1.2558, 2008-03-13 12:47:24+04:00, svoj@mysql.com +6 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode if
              binlog_format=mixed
  
  Statement-based replication of DELETE ... LIMIT, UPDATE ... LIMIT,
  INSERT ... SELECT ... LIMIT is not safe as order of rows is not
  defined.
  
  With this fix, we issue a warning that this statement is not safe to
  replicate in statement mode, or go to row-based mode in mixed mode.
[18 Mar 2008 10:55] Kristian Nielsen
A similar problem can occur with a simple multirow UPDATE of a column with a unique
constraint. Such a statement can succeed or fail depending on the order in which rows are
updated, if the after value of one row conflicts with the before value of another row.

In this case, if the update order differs on master and slave, it is possible for the
query to succeed on the master, but fail with constraint violation on the slave.

Arguably, this is an error in the application, as the update on the master succeeded only
by sheer luck. So I am not sure if it should be fixed by going to mixed mode?

Alternatively it might be worthwhile to mention this problem explicitly on
http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html (or similar place) as a
source of non-deterministic UPDATE that may fail in statement-based mode.

Test case showing this. This will fail in statement-based mode, but succeed in row-based
mode.

--source include/master-slave.inc
--source include/not_embedded.inc

connection master;
create table t1 (a int primary key, b int unique) engine=myisam;
insert into t1 values (1,2), (2,1);
sync_slave_with_master;
# Re-write table on slave with same data, but different insert order.
# This emulates possible different SELECT order on master and slave
truncate t1;
insert into t1 values (2,1), (1,2);
# Update on master; this update fails or not depending on actual order of update
connection master;
update t1 set b=b+1;
select * from t1 order by a;

# This fails in mixed (or statement) based replication due to different update
# order on slave. Works in row-based replication.
sync_slave_with_master;
select * from t1 order by a;

connection master;
drop table t1;
sync_slave_with_master;
[18 Mar 2008 16:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44178

ChangeSet@1.2558, 2008-03-18 19:30:15+04:00, svoj@mysql.com +6 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode if
              binlog_format=mixed
  
  Statement-based replication of DELETE ... LIMIT, UPDATE ... LIMIT,
  INSERT ... SELECT ... LIMIT is not safe as order of rows is not
  defined.
  
  With this fix, we issue a warning that this statement is not safe to
  replicate in statement mode, or go to row-based mode in mixed mode.
  
  Note that we may consider a statement as safe if ORDER BY primary_key
  is present. However it may confuse users to see very similiar statements
  replicated differently.
[18 Mar 2008 17:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44181

ChangeSet@1.2558, 2008-03-18 20:25:34+04:00, svoj@mysql.com +6 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode if
              binlog_format=mixed
  
  Statement-based replication of DELETE ... LIMIT, UPDATE ... LIMIT,
  INSERT ... SELECT ... LIMIT is not safe as order of rows is not
  defined.
  
  With this fix, we issue a warning that this statement is not safe to
  replicate in statement mode, or go to row-based mode in mixed mode.
  
  Note that we may consider a statement as safe if ORDER BY primary_key
  is present. However it may confuse users to see very similiar statements
  replicated differently.
  
  Note 2: regular UPDATE statement (w/o LIMIT) is unsafe as well, but
  this patch doesn't address this issue. See comment from Kristian
  posted 18 Mar 10:55.
[25 Mar 2008 14:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44389

ChangeSet@1.2571, 2008-03-25 17:28:12+04:00, svoj@mysql.com +4 -0
  BUG#34768 - nondeterministic INSERT using LIMIT logged in stmt mode
              if binlog_format=mixed
  
  Addition to fix for BUG#34768: fixed test case failures discovered
  by pushbuild.
[27 Mar 2008 12:18] Bugs System
Pushed into 5.1.24-rc
[27 Mar 2008 18:50] Bugs System
Pushed into 6.0.5-alpha
[29 Mar 2008 18:53] Jon Stephens
Documented bugfix in the 5.1.24 and 6.0.5 changelogs as follows:

        When using statement-based replication and a DELETE, UPDATE, or
        INSERT ... SELECT statement using a LIMIT clause is encountered, a warning
        that the statement is not safe to replicate in statement mode is now
        issued; when using MIXED mode, the statement is now
        replicated using the row-based format.
        
        Exception. The statement is considered safe for statement-based 
        replication if it uses ORDER BY primary_key.

Also added new information about the effects of LIMIT clause on these statements to
replication-features section of 5.1/6.0 Manual.
[29 Mar 2008 19:16] Sergey Vojtovich
Jon, sorry for confusion, but the exception is not valid. If LIMIT clause is specified, we
always go mixed mode or issue a warning. In other words, we the statement is not
considered safe if it uses ORDER BY primary_key.
[29 Mar 2008 19:24] Jon Stephens
Removed erroneous description of 'exception' from changelog and other docs per Svoj's
comment above.
[2 Apr 2008 22:29] Jon Stephens
Also noted in the 5.1.23-ndb-6.3.11 changelog.