Bug #14418 RBR: Uncertain behaviour when stopping updates in the middle
Submitted: 28 Oct 2005 9:02 Modified: 9 Dec 2005 14:36
Reporter: Mats Kindahl Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version: OS:
Assigned to: Guilhem Bichot CPU Architecture:Any

[28 Oct 2005 9:02] Mats Kindahl

During the execution of the binrow events (Rows events), the group
relay log position is not updated before all events associated with
that table map has been executed.  That way we guarantee that, on a
crash, we do not restart with a binrow event for which we have not
seen the corresponding table map event.


**Guilhem**: A big problem is (as we discussed) that if this is a
MyISAM update taking several Rows events in the binlog, and the slave
is stopped before executing the last event, it will later restart at
the Table_map event, thus inserting rows which already have. OTOH
Table_map is needed for slave to restart.  One way is to make STOP
SLAVE refuse to stop if the last executed event is a Rows event which
is not really the last one, or refuse to stop as long as there are
"active" (don't know if this is relevant) table mappings (mappings
which would be needed at restart).  I think this can also be fixed
with interleaving, where we'll have two coordinates. In the above case
we could say that the coordinate "where to restart from" is the
Table_map, and the coordinate "up to which everything must be rolled
back" (ie. for MyISAM "up to which nothing must be executed") is the
coordinate where the slave stopped.

How to repeat:
Code inspection [will add testcase later].
[2 Dec 2005 21:21] Guilhem Bichot
A temporary solution will be first implemented (if it is in the middle of a RBR Rows_log_event updating a non-transactional, non-primary-key table, the slave SQL thread will wait until it has executed the last Rows_log_event (the one with the TRANS_END_F flag). If this last event cannot be found, the thread will stop after a timeout and issue an inconsistency warning in the error log.
WL#2975 is a more complete solution which will be implemented later.
[7 Dec 2005 14:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[9 Dec 2005 14:36] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at

Additional info:

This bug is temporarily fixed by implemented the solution described earlier in the comments above.
WL#2975 will be implemented later (and then that temporary fix can be removed).