MySQL Bugs: #47926: Slave replication fails soon after starting

Bug #47926	Slave replication fails soon after starting
Submitted:	8 Oct 2009 22:24	Modified:	16 Nov 2009 17:59
Reporter:	Robert Bankay	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.1.36	OS:	Linux (Fedora 11)
Assigned to:		CPU Architecture:	Any
Tags:	replication, slave

Description:
Impact: Cannot use replication.

The following is taken from the slave status:

Could not execute Update_rows event on table ......
Can't find record in 'workunit', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;

--------------------------------------
We are using query replication only. It does not show the details of the query that failed.

The master and slave are both running MySQL 5.1.26;
the table is innodb and can be accessed from a mysql client.

There is no indication of what record in the table is not being found in the slave.

Here is the slave MySQL error log:

091008 14:55:07 [Note] Slave SQL thread initialized, starting replication in log 'mork_mysql-bin.000026' at position 26399, relay log '/mydisks/a/apps/mysql/data/mysqld-relay-bin.000004' position: 26549
091008 14:55:07 [ERROR] Slave SQL: Could not execute Update_rows event on table SETI_BOINC.workunit; Can't find record in 'workunit', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mork_mysql-bin.000026, end_log_pos 27823, Error_code: 1032
091008 14:55:07 [Warning] Slave: Can't find record in 'workunit' Error_code: 1032
091008 14:55:07 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mork_mysql-bin.000026' position 26399

How to repeat:
Not sure. We upgraded from MySQL 5.0 to 5.1 and used mysqldump to create a db backup then then used that backup restore the slave. The restored db on the slave fails when replication is started.

Suggested fix:
No idea.

We are using Fedora 11 (64 bit) as well as 64 bit MySQL 5.1.26.

5.1.26 is more than one year old. Please, check with 5.1.39. 

As for statement that caused the problem, you have the information about binary log and postion on master and relay log and position on slave. With statement based replication all you need to see the exact query is to use that values with mysqlbinlog... Please, check.

Found this statement in the relay log that had no corresponding row in the slave db.  (The master is way out of sync now and while it does not have the row, it could have been deleted by a daily process):

SETI_BOINC^@update workunit set need_validate=0, error_mask=0, assimilate_state=1, transition_time=2147483647, target_nresults=2, canonical_resultid=1374954643, canonical_credit=3.130572140075570e+01 where id=510125278

The value for workunit.id does not exist in the slave db.  It is not clear why the restored table would not have a row that was being required by the replication process.  Could it be that a reset slave occurred but not a corresponding reset master?

Do you use innodb tables?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Sorry.  I thought that I sent info that we got around one of the problems by setting the mysqldump option for 'single inserts'.  Restores work fine with that option. The tables in question were innodb tables.

And we found that we had reset the slave and not done a corresponding reset of the master server;  so the binary/relay log was starting from 8 days prior and attempting the update a row that had already been deleted at some point since/

Please close this trouble report.

So, looks like this problem was not a result of any bug in MySQL code.

The recovery problem we had was not a bug; however the mysqldump formatting problem when using 'mysqldump --opt' for (innodb?) large tables is still an issue but it can be worked around.

So since a work around exists, then I would not classify this as a serious bug.  Or you can close this out as I suggested previously.

Thanks,
Bob B