Bug #28829 replication breaks due to binlogs events written in wrong order
Submitted: 1 Jun 2007 10:01 Modified: 13 Jun 2007 13:35
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.1.21,5.0.44 OS:Any
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: replication

[1 Jun 2007 10:01] Shane Bester
Description:
replication can break in this scenario because the events in binlogs are in wrong order.

#connection 1

set autocommit=0;
drop table if exists t3;
create table t3(cnt int, name varchar(5))engine=innodb;
insert into t3(cnt,name) select count(*),t1.name from t1,t2 where t1.id=t2.id group by t1.name; #insert takes some minutes or seconds to run

While the insert runs, do this in connection 2.

#connection 2
set autocommit=0;
drop table if exists t3;

problem is the slave fails with message:

070601 11:54:04 [ERROR] Slave: Error 'Table 'test.t3' doesn't exist' on query. Default database: 'test'. Query: 'insert into t3(cnt,name) select count(*),t1.name from t1,t2 where t1.id=t2.id group by t1.name', Error_code: 1146
070601 11:54:04 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'www-bin.000001' position 2225004

If you look in the binlogs, the table was dropped before the insert!

How to repeat:
It's repeatable with the command line client, but takes a few attempts as it appears to be some timing problem.

To make it easier to reproduce, I'll attach a C testcase now which will cause the error quicker (but it is slightly more complex).

Suggested fix:
.
[1 Jun 2007 10:09] MySQL Verification Team
testcase. point it to the master server and run. watch slave error logs. .  see compile, host, user examples a top of the file.

Attachment: bug28829.c (text/plain), 7.19 KiB.

[1 Jun 2007 10:12] MySQL Verification Team
I started master with clean datadir on 192.168.250.2

./bin/mysqld_safe --skip-grant-tables --server-id=5 --log-bin --port=3307 --skip-name-resolve &

I started slave on 127.0.0.1 with clean datadir:

mysqld-debug  --console --skip-grant-tables --server-id=3 --log --log-bin --port=3306

On slave, initialized replication:
CHANGE MASTER TO MASTER_HOST='192.168.250.2', MASTER_PORT=3307, MASTER_USER='root', MASTER_PASSWORD='12345';

START SLAVE;

Then, ran the attached testcase. Slave logs show error in < 1 minute.
[5 Jun 2007 11:16] MySQL Verification Team
Possibly related to:

bug #25144
bug #989
[13 Jun 2007 13:35] Konstantin Osipov
This bug is a duplicate of Bug#989 "If DROP TABLE while there's an active transaction, wrong binlog order".
Please subscribe to the older bug if you're interested in progress on this problem.
Thank you.