Bug #14769 Function fails to replicate if fails half-way (slave stops)
Submitted: 8 Nov 2005 20:23 Modified: 22 Feb 2006 17:59
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0 OS:Linux (linux)
Assigned to: Guilhem Bichot CPU Architecture:Any

[8 Nov 2005 20:23] Guilhem Bichot
create table t2 (a int, unique(a));
delimiter | ;
create function fn1()
       returns int
  insert into t2 values(20),(20);
  return 10;
delimiter ; |
select fn1();
on master.
Then the SELECT will return error 1062 ("duplicate entry etc"), will go to binlog in this form:
#010909  3:46:40 server id 1  end_log_pos 5278  Query   thread_id=2     exec_time=0 error_code=1062
SET TIMESTAMP=1000000000;
DO `fn1`();
It was translated to DO but with the error code of the SELECT.
But as DO always returns "no error", on slave the DO will be executed and will return "no error", and so slave will stop saying:
051108 19:44:44 [ERROR] Slave: Query caused different errors on master and slave. Error on master: 'Duplicate entry '%-.64s'
 for key %d' (1062), Error on slave: 'no error' (0). Default database: 'mysqltest1'. Query: 'DO `fn1`()', Error_code: 0

How to repeat:
see description

Suggested fix:
Binlog "SELECT fn1()" instead of "DO fn1()", as was planned by SergeyP anyway.
This way, the slave will execute SELECT and have the 1062 error code.
But with this fix, what if, on the opposite, "DO fn1" is issued on master? It will be ok because "DO fn1" will be binlogged as "SELECT fn1()" with the error code 1062, not 0. Indeed the binlogging happens before DO resets the error code from 1062 to 0.
Note that Konstantin already verified, in his to-be-pushed fix for BUG#14077, that slave SQL thread can execute SELECT ok.

The logic above currently won't work because slave rewrites SELECTs as DOs; in Konstantin's patch this rewrite is removed, so then it will work. Conclusion: I'll wait for Konstantin to push his patch for BUG#14077 and then I'll fix the present bug.
[16 Nov 2005 13:21] Guilhem Bichot
it's not a crash but still will make slave stop in a not-so-rare situation, so raising it to P2.
[19 Jan 2006 13:27] Guilhem Bichot
The bug was observed when the user issues "select fn1()", but also exists when he issues "do fn1()", indeed in "do fn1()", the error recorded in binlog is 1062 even if DO returns 0 (this is because binlogging happens before the error code is cleared), and slave executes DO, ends up with an error code of 0, and compares that to 1062, so fails. The solution to both problems is unique: binlog "SELECT fn1()" instead of "DO fn1()" (and then "do fn1()" will be binlogged as "SELECT fn1()").
[19 Jan 2006 13:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[20 Jan 2006 12:56] Guilhem Bichot
Setting back to "patch pending", for Sergey's second approval.
[24 Jan 2006 14:57] Guilhem Bichot
I will push the fix in 5.0 and 5.1 when 5.1 gets up to date first.
SergeyP's comment which I should implement is update the big comment in sp_head.cc.
[18 Feb 2006 16:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[18 Feb 2006 16:35] Guilhem Bichot
  1.2062 06/02/18 17:26:30 guilhem@mysql.com +5 -0
  Fix for BUG#14769 "Function fails to replicate if fails half-way (slave stops)":
  if the function, invoked in a non-binlogged caller (e.g. SELECT, DO), failed half-way on the master,
  slave would stop and complain that error code between him and master mismatch.
  To solve this, when a stored function is invoked in a non-binlogged caller (e.g. SELECT, DO), we binlog the function
  call as SELECT instead of as DO (see revision comment of sp_head.cc for more).
[18 Feb 2006 20:34] Guilhem Bichot
the fix is in 5.0.19 and 5.1.8
[22 Feb 2006 17:59] Paul DuBois
Note in 5.0.19, 5.1.7 changelogs.

        Previously, a stored function invocation was written to the
        binary log as <literal>DO
        <replaceable>func_name</replaceable>()</literal> if the
        invocation changes data and occurs within a non-logged
        statement, or if the function invokes a stored procedure that
        produces an error. These invocations now are logged as
        <literal>SELECT <replaceable>func_name</replaceable>()</literal>
        instead for better control over error code checking (slave
        servers could stop due to detecting a different error than
        occurred on the master). (Bug #14769)