Bug #55305 Rows based replication not working
Submitted: 15 Jul 2010 22:03 Modified: 29 May 2011 14:47
Reporter: Jamie Koc Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S1 (Critical)
Version:5.1.42 OS:Linux
Assigned to: CPU Architecture:Any
Tags: rows based replication

[15 Jul 2010 22:03] Jamie Koc
Description:
I apologize for the non-descriptive synopsis but I can't seem to get rows based replication working at all. 

I have a master->slave1->slave2 configuration.
master->slave1 is using statement based replication (working fine)
slave1->slave2 is rows based (replicating 2 databases using replicate-wild-do-table). 

In slave1 my.cnf:
binlog_format=ROW

I am not seeing any errors when running a show slave status on slave2. However, Seconds_Behind_Master keeps increasing, more and more relay logs are being created, and show processlist just says "Reading event from the relay log"

If you need any more information, please let me know.

Thanks!

How to repeat:
1. Configure a master->slave1->slave2 configuration
2. Replicate a database from master->slave1 using statement based replication 
3. Replicate a database from slave1->slave2 using rows based replication 

In slave1 my.cnf:
binlog_format=ROW
log-slave-updates

In slave2 my.cnf
replicate-wild-do-table = dbname_here.%
[16 Jul 2010 3:30] Jamie Koc
I am not able to issue a stop slave command on slave2 which is receiving the rows based replication. The server gets stuck where state = Killing slave.
In performing a service mysql stop, the server hangs trying to shutdown and I have to do a kill -9 on the processid.
[16 Jul 2010 16:37] Jamie Koc
After reloading and restarting the server, I was able to get rows based replication up and running for a brief moment.

I encountered this error:
Could not execute Delete_rows event on table test; 
Can't find record in 'test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log as11-bin.009976, end_log_pos 913421094

How do I identify which transaction or obtain the sql statement that the slave is failing on? Here the information from the logs. 

From master bin-log
#100716  3:14:06 server id 98001  end_log_pos 913421094         Delete_rows: table id 13624

Slave relay-log
# at 425442804
#100716  3:14:06 server id 98001  end_log_pos 913419940         Query   thread_id=286136        exec_time=1078  error_code=0
SET TIMESTAMP=1279264446/*!*/;
BEGIN

After issuing a SET GLOBAL SQL_SLAVE_SKIP_COUNTER =1 and starting the slave, I noticed that the relay log position is not changing at all. Once again I can not stop the slave and have to kill the processes using kill -9.
[17 Jul 2010 18:42] Sveta Smirnova
Thank you for the report.

But version 5.1.42 is old and several replication-related bugs were fixed since. Please upgrade to current version 5.1.48, try with it and if problem still exists please inform us if slave2 is read only.
[17 Aug 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[26 Aug 2010 18:37] Jamie Koc
Upgrading to 5.1.48 resolved all issues I was having.

Thanks!
[22 Sep 2010 17:07] Jamie Koc
I need to reopen this bug. I am still having issues with row based replication.

I am replicating from a master (version 5.1.42) to 2 different slaves (versions 5.1.48). One slave is replicating perfectly, the 2nd slave is not.
Replication gets stuck and the relay logs do not processed, stuck in the same position. I can not stop the slave or stop the MySQL service on the 2nd slave.
[24 Sep 2010 17:01] Jamie Koc
Sorry guys.
I just realized a table was missing a primary key.
This caused rows based replication to slow down significantly.

I am closing the bug.
[8 Oct 2010 13:13] Valeriy Kravchuk
Not a bug in MySQL, according to the last comment.
[29 May 2011 14:47] Jamie Koc
After getting the error again, I decoded the relay log and noticed that there is an autocommit issued a few statements before the truncate table was executed.

SET TIMESTAMP=1306532422/*!*/;
SET @@session.autocommit=1/*!*/;

Could this be causing the problem?

I don't see any lock tables statements in the relay log.