Bug #12815 Replication breaks if one drops FUNCTION when it is pre-loaded by other thead
Submitted: 25 Aug 2005 23:21 Modified: 27 Sep 2008 14:52
Reporter: Sergey Petrunya
Status: Verified
Category:Server: Replication Severity:S1 (Critical)
Version:5.0.13 OS:Any
Assigned to: Jon Olav Hauglid Target Version:
Triage: Triaged: D2 (Serious)

[25 Aug 2005 23:21] Sergey Petrunya
Description:
Replication may break if one drops FUNCTION in one thread when it is preloaded for
execution in another thread and later is executed by that thread.

This bug is a chopped of sub-problem in BUG#12335. The fix proposal for BUG#12335
contains a way to address this issue but it will break if the slave is restarted while it
has  already-deleted FUNCTION in the cache.

How to repeat:
source include/master-slave.inc;
connection master;
create table t1 (a int);
create table t2 (a char(200));

delimiter //;
create procedure p1()
begin
  declare dummy int;
  while ((select count(*) from t1) < 1) do
    set dummy = sleep(1);
  end while;
end//

create procedure p2()
begin
  select f1();
  
  call p1();
  delete from t1 limit 1;
  
  select f1();
  
  call p1();
  delete from t1 limit 1;
  
  select f1();
end//

create function f1() returns int 
begin
  insert into t2 values('f1-r1');
  return 0;
end// 
delimiter ;//

connection master;
send call p2();

connection master1;
delimiter //;

drop function f1//
create function f1() returns int 
begin
  insert into t2 values('f1-r2');
  return 0;
end// 
insert into t1 values (1) //

call p1()//

drop function f1//
create function f1() returns int 
begin
  insert into t2 values('f1-r3');
  return 0;
end// 
insert into t1 values (1) //

call p1()//

delimiter ;//

connection master;
reap;

select * from t2;

connection slave;
select * from t2;

# Clean up
connection master;
drop table t1;

sync_slave_with_master;

Suggested fix:
We now see the following approaches to fix this:

1) Disallow dropping of SP if it is preloaded for execution by some thread: either return
an error or delay the drop operation (with the goal of not writing it into binlog until it
may be used by some thread)

2) Implement several persistent SP caches on the slave. Basically copy the solution with
temporary tables (have a temporary mysql.proc-like table with additional cache-id
column)

3) Instead of writing DROP FUNCTION|PROCEDURE into the binary log, write RENAME FUNCTION,
use some naming scheme and tweaks with SP qname resolution.
[17 Oct 2005 12:10] Konstantin Osipov
Below is the test case that demonstrates the problem:

source include/master-slave.inc

#
# Case 1. No locks are held for a stored function while it's executed
#
# master
connection master;
create table t1 (a int);
create function f1() returns int return 1;
lock table t1 write;

# master1 
connect (master1,127.0.0.1,root,,test,$MASTER_MYPORT,);
connection master1;
send insert into t1 (a) values (f1());

# master2 
connect (master2,127.0.0.1,root,,test,$MASTER_MYPORT,);
connection master2;
drop function f1; -- goes to binlog
# Of course, if we do not recreate the function the slave will stop
# at execution of the insert, 
create function f1() returns int return 2;

# master 
connection master;
unlock tables;

# master1 
connection master1;
reap;

# check results
connection master;
select * from t1;
sync_slave_with_master;
connection slave;
select * from t1;

This test produces the following .result:
stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;
create table t1 (a int);
create function f1() returns int return 1;
lock table t1 write;
 insert into t1 (a) values (f1());
drop function f1;
create function f1() returns int return 2;
unlock tables;
select * from t1;
a
1
select * from t1;
a
2
drop table t1;
drop function if exists f1;

As one can easily see, the slave goes out of sync with master and inserts 2 into t1.
If we don't recreate function f1, the slave will issue an error at insert and stop,
which is a more disastrous manifestation of the problem.
[10 Mar 2006 17:25] Konstantin Osipov
A fix for this bug requires support for object-level read/write locking for all metadata
objects. This is a too big change to be implemented in 5.0.
This bug should be fixed when a new data dictionary is implemented.
[27 Sep 2008 14:52] Konstantin Osipov
This can be fixed now, with new metadata locking.
[22 Aug 11:03] Konstantin Osipov
Duplicate of Bug#30977