Bug #31581 | 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting | ||
---|---|---|---|
Submitted: | 13 Oct 2007 15:55 | Modified: | 6 Feb 2008 15:56 |
Reporter: | Serge Kozlov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.1.22 | OS: | Any |
Assigned to: | Sven Sandberg | CPU Architecture: | Any |
Tags: | replication, telco, upgrade, version |
[13 Oct 2007 15:55]
Serge Kozlov
[13 Oct 2007 16:04]
Serge Kozlov
testcase
Attachment: bug31581.tgz (application/x-compressed, text), 35.01 KiB.
[22 Oct 2007 16:32]
Sven Sandberg
== Detailed description of what goes wrong ====== The bug happens for when replicating write, update, or delete rows events from a 5.1.0-5.1.15 master to a 5.1.16-or-later slave. There old RBR events (write, update, delete) had event_type 20, 21, 22, respectively. These were changed in 5.1.16 and we now use three new event_types; 23, 24, 25. In the new code, event types 23, 24, 25 map to `{Write,Update,Delete}_rows_log_event' and event types 20, 21, 22 map to classes `{Write,Update,Delete}_rows_log_event_old'. Each of the latter three classes is a subclass of the corresponding class without the _old suffix, and each of the thee former classes is a subclass of Rows_log_event. For simplicity, I will now only talk about write events. However, the discussions applies to update and delete events too. When an old master replicates a write event (number 20) to a new slave, the slave will construct Write_rows_log_event_old events. In the constructor Write_rows_log_event_old(char *buf, uint event_len, Format_description_event*), it merely calls the same constructor for `Write_rows_log_event', which in turn calls the constructor for `Rows_log_event' with an extra argument `Log_event_type event_type' set to `WRITE_ROWS_EVENT'. Note that even for Write_rows_log_event_old, the event_type is set to WRITE_ROWS_EVENT = 23, and not PRE_GA_WRITE_ROWS_EVENT = 20. The event_type is used for two purposes: - event_type is used in a condition to execute code specific to update events (and common to old and new events). This code is ok. - event_type is used to extract the post_header_length from the Format_description_log_event. This is where the bug is. When a Format_description_log_event is sent by an old master, it only describes events up to number 22; therefore, description_event->post_header_len has only 22 bytes allocated on the slave. In this Rows_log_event constructor, the event_type is used as an index in this array, and hence reads garbage outside the array.
[22 Oct 2007 18:02]
Sven Sandberg
== Fix ====== Pass event_type=PRE_GA_WRITE_ROWS_EVENT instead of event_type=WRITE_ROWS_EVENT to the constructor of Rows_log_event. This entails adding the event_type parameter to the constructor for Write_rows_log_event and checking if (event_type == PRE_GA_WRITE_ROWS_EVENT || event_type == WRITE_ROWS_EVENT) instead of if (event_type == WRITE_ROWS_EVENT) in the constructor for Rows_log_event. We can either modify the existing constructor (so there will no longer be any {Write|Update|Delete}_rows_log_event(const char *buf...) constructor with only 3 parameters), or add a new constructor (so there is one such constructor with 3 arguments and one with 4 arguments). I think it is better to modify the constructor, because the constructor is called at very few places, so adding more constructors would bloat the code. Also, there is no risk to call the wrong constructor by mistake and, e.g., pass a WRITE_ROWS_EVENT (implicitly) when the correct behavior would have been to pass a PRE_GA_WRITE_ROWS_LOG_EVENT.
[23 Oct 2007 7:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36099 ChangeSet@1.2575, 2007-10-22 20:05:21+02:00, sven@murkla.(none) +3 -0 BUG#31581 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting Added an extra argument `Log_event_type event_type' to the constructors {Write|Update|Delete}_rows_log_event(const char *buf...), because the constructor may be called from the constructor of the corresponding subclass {Write|Update|Delete}_rows_log_event_old(const char *buf...), in which case another entry from description_event->post_header_len must be read.
[29 Oct 2007 13:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36556 ChangeSet@1.2584, 2007-10-29 14:07:40+01:00, sven@murkla.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting Copied all code from {Write|Update|Delete}_rows_log_event to {Write|Update|Delete}_rows_log_event_old and from Rows_log_event to Old_rows_log_event. This is just a stupid copy-and-paste; it will not even compile. Updates will be committed in separate changesets.
[29 Oct 2007 14:16]
Sven Sandberg
== New Fix ====== I talked with Mats, and we decided on another way to solve the problem. An underlying problem causing the bug is that the new code needs to be aware of the old code. To avoid dependencies between old events and new events, I will make {Write|Update|Delete}_rows_log_event_old not be subclasses of {Write|Update|Delete}_rows_log_event. Then the new code can be changed freely without taking into consideration all the different upgrade scenarios. This is a much bigger change than the above one, and I will make it in several steps. The current class hierarchy is like this: -----------------> Log_event <------------------ / \ / \ / \ Rows_log_event Old_rows_log_event ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | Write_rows_log_event <---- Write_rows_log_event_old | | | | | | | | | | | Update_rows_log_event <-------------- Update_rows_log_event_old | | | | | Delete_rows_log_event <-------------------------- Delete_rows_log_event_old The goal is to remove the three left-going arrows. To this end, I will merge Rows_log_event into Old_rows_log_event and merge {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old, respectively. I will commit this in several steps. (I will not push the patch from 23 Oct 9:35, please ignore it.) The first patch (committed on 29 Oct 17:07) merely copies all code from {Write|Update|Delete}_rows_log_event to {Write|Update|Delete}_rows_log_event_old and from Rows_log_event to Old_rows_log_event. The second patch will make constructors work correctly, and change all places that mention {Write|Update|Delete}_rows_log_event so that they instead mention {Write|Update|Delete}_rows_log_event_old. The third patch will probably make everything else work (e.g., remove some fields that are duplicated in Write_rows_log_event and Write_rows_log_event_old, etc) so that it compiles. However, I may break down the procedure into more steps, if that seems useful.
[31 Oct 2007 10:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36737 ChangeSet@1.2585, 2007-10-31 11:09:17+01:00, sven@murkla.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting made it compile and work exactly like before the first patch.
[31 Oct 2007 10:11]
Sven Sandberg
== Steps to fix ====== Correction to my comment above: Old_rows_log_event is not a subclass of Log_event. I will submit four patches: (1) Copy-and-paste contents of Rows_log_event into Old_rows_log_event, and contents of {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old. I will not even remove duplicate methods or rename constructors; it is a pure copy-and-paste. Done on Oct 29 14:07. (2) Make it compile and work exactly like before. This includes: - merge some constructors into one; remove some destructors - rename get_type_code to xget_type_code and TYPE_CODE to XTYPE_CODE in copied code and declare a pure virtual xget_type_code in Log_event_old - remove `friend class Old_rows_log_event' from Rows_log_event and Old_rows_log_event - make Old_rows_log_event inherit from Log_event and *_rows_log_event_old inherit from Old_rows_log_event. - remove static functions last_uniq_key() and record_compare(), which were defined in both log_event.cc and log_event_old.cc and copied another time to log_event_old.cc in step (1), as well as the non-static duplicate_key_error() which was copied from log_event.cc to log_event_old.cc but not used in log_event_old.cc. See following comments. (3) Fix the bug. See following comments. (4) Re-arrange log_event_old.{cc,h} a bit. After copying and pasting, the methods are not in any structured order in the file. Remove some temporary comments.
[31 Oct 2007 10:14]
Sven Sandberg
== DETAILS ON CONSTRUCTORS FOR PATCH 2 ====== The following list contains; for each of rows, write_rows, delete_rows, and update_rows; all constructors that appear either in the new class or in the old class. For each constructor, it is indicated whether the constructor is defined in the .cc file or in the .h file; any preprocessor conditions under which the constructor is defined; the constructor head; any superclass constructors called by the constructor; and a text indicating what was done with the constructor. ---------- ROWS ---------- .cc #ifndef MYSQL_CLIENT Rows_log_event::Rows_log_event(THD *thd_arg, TABLE *tbl_arg, ulong tid, MY_BITMAP const *cols, bool is_transactional) : Log_event(thd_arg, 0, is_transactional), No changes needed .cc Rows_log_event::Rows_log_event(const char *buf, uint event_len, Log_event_type event_type, const Format_description_log_event *description_event) : Log_event(buf, description_event), No changes needed .cc Rows_log_event::~Rows_log_event() No changes needed .h virtual ~Old_rows_log_event() {} Remove (does not do anything; the destructor above should be used instead) ---------- WRITE ---------- .cc #if !defined(MYSQL_CLIENT) Write_rows_log_event::Write_rows_log_event(THD *thd_arg, TABLE *tbl_arg, ulong tid_arg, MY_BITMAP const *cols, bool is_transactional) : Rows_log_event(thd_arg, tbl_arg, tid_arg, cols, is_transactional) No changes needed .cc #ifdef HAVE_REPLICATION Write_rows_log_event::Write_rows_log_event(const char *buf, uint event_len, const Format_description_log_event *description_event) : Rows_log_event(buf, event_len, WRITE_ROWS_EVENT, description_event) No changes needed .h #if !defined(MYSQL_CLIENT) Write_rows_log_event_old(THD *thd, TABLE *table, ulong table_id, MY_BITMAP const *cols, bool is_transactional) : Write_rows_log_event(thd, table, table_id, cols, is_transactional) Remove (identical to Write_rows_log_event constructor with same args) .h #if defined(HAVE_REPLICATION) Write_rows_log_event_old(const char *buf, uint event_len, const Format_description_log_event *descr) : Write_rows_log_event(buf, event_len, descr) Remove (identical to Write_rows_log_event constructor with same args) ---------- DELETE ---------- .cc #ifndef MYSQL_CLIENT Delete_rows_log_event::Delete_rows_log_event(THD *thd_arg, TABLE *tbl_arg, ulong tid, MY_BITMAP const *cols, bool is_transactional) : Rows_log_event(thd_arg, tbl_arg, tid, cols, is_transactional) Add initialization from _old constructor .cc #ifdef HAVE_REPLICATION Delete_rows_log_event::Delete_rows_log_event(const char *buf, uint event_len, const Format_description_log_event *description_event) : Rows_log_event(buf, event_len, DELETE_ROWS_EVENT, description_event) Add initialization from _old constructor .h #if !defined(MYSQL_CLIENT) Delete_rows_log_event_old(THD *thd, TABLE *table, ulong table_id, MY_BITMAP const *cols, bool is_transactional) : Delete_rows_log_event(thd, table, table_id, cols, is_transactional), Remove and merge with Delete_rows_log_event constructor with same args. .h #if defined(HAVE_REPLICATION) Delete_rows_log_event_old(const char *buf, uint event_len, const Format_description_log_event *descr) : Delete_rows_log_event(buf, event_len, descr), Remove and merge with Delete_rows_log_event constructor with same args. ---------- UPDATE ---------- .cc #if !defined(MYSQL_CLIENT) Update_rows_log_event::Update_rows_log_event(THD *thd_arg, TABLE *tbl_arg, ulong tid, MY_BITMAP const *cols_bi, MY_BITMAP const *cols_ai, bool is_transactional) : Rows_log_event(thd_arg, tbl_arg, tid, cols_bi, is_transactional) Remove (unused). .cc #if !defined(MYSQL_CLIENT) Update_rows_log_event::Update_rows_log_event(THD *thd_arg, TABLE *tbl_arg, ulong tid, MY_BITMAP const *cols, bool is_transactional) : Rows_log_event(thd_arg, tbl_arg, tid, cols, is_transactional) Add initialization from _old constructor .cc Update_rows_log_event::~Update_rows_log_event() No changes needed .cc #ifdef HAVE_REPLICATION Update_rows_log_event::Update_rows_log_event(const char *buf, uint event_len, const Format_description_log_event *description_event) : Rows_log_event(buf, event_len, UPDATE_ROWS_EVENT, description_event) Add initialization from _old constructor. .h #if !defined(MYSQL_CLIENT) Update_rows_log_event_old(THD *thd, TABLE *table, ulong table_id, MY_BITMAP const *cols, bool is_transactional) : Update_rows_log_event(thd, table, table_id, cols, is_transactional), Remove from here, merge with Update_rows_log_event constructor with same args. .h #if defined(HAVE_REPLICATION) Update_rows_log_event_old(const char *buf, uint event_len, const Format_description_log_event *descr) : Update_rows_log_event(buf, event_len, descr), Remove from here, merge with Update_rows_log_event constructor with same args.
[31 Oct 2007 10:17]
Sven Sandberg
== DETAILS ON METHODS AND FIELDS FOR PATCH 2 ====== The following list contains each method or field that appears in any of the old classes. For each method or field, the list describes all declarations of it; followed by a text indicating what should be done with it. --------- do_apply_event --------- In Rows_log_event: virtual int do_apply_event(Relay_log_info const *rli); In Old_rows_log_event: int do_apply_event(Rows_log_event*,const Relay_log_info*); In each {Update|Delete|Write}_rows_log_event_old: virtual int do_apply_event(const Relay_log_info *rli) { return Old_rows_log_event::do_apply_event(this,rli); } No action needed (different args) ---------- do_before_row_operations ---------- In Rows_log_event: virtual int do_before_row_operations(const Slave_reporting_capability *const log) = 0; In Old_rows_log_event: virtual int do_before_row_operations(TABLE *table) = 0; In each {Update|Delete|Write}_rows_log_event: virtual int do_before_row_operations(const Slave_reporting_capability *const); In each {Update|Delete|Write}_rows_log_event_old: virtual int do_before_row_operations(TABLE *table); No action needed (different args) ---------- do_after_row_operations ---------- In Rows_log_event: virtual int do_after_row_operations(const Slave_reporting_capability *const log, int error) = 0; In Old_rows_log_event: virtual int do_after_row_operations(TABLE *table, int error) = 0; In each {Update|Delete|Write}_rows_log_event: virtual int do_after_row_operations(const Slave_reporting_capability *const,int); In each {Update|Delete|Write}_rows_log_event_old: virtual int do_after_row_operations(TABLE *table, int error); No action needed (different args) ---------- do_prepare_row ---------- In Old_rows_log_event and each subclass: virtual int do_prepare_row(THD*, Relay_log_info const*, TABLE*, uchar const *row_start, uchar const **row_end) = 0; No action needed (different args) ---------- do_exec_row ---------- In Rows_log_event: virtual int do_exec_row(const Relay_log_info *const rli) = 0; In Old_rows_log_event: virtual int do_exec_row(TABLE *table) = 0; In each {Update|Delete|Write}_rows_log_event: virtual int do_exec_row(const Relay_log_info *const); In each {Update|Delete|Write}_rows_log_event_old: virtual int do_exec_row(TABLE *table); No action needed (different args) ---------- get_type_code and TYPE_CODE ---------- In each {Update|Delete|Write}_rows_log_event[_old]: enum { TYPE_CODE = XXXX }; virtual Log_event_type get_type_code() { return (Log_event_type)TYPE_CODE; } Action: in each of {Update|Delete|Write}_rows_log_event, replace by enum { XTYPE_CODE = {WRITE|UPDATE|DELETE}_ROWS_EVENT }; virtual Log_event_type xget_type_code() { return (Log_event_type)XTYPE_CODE; } Rationale: the current patch should make the behavior identical to what it was before. Next patch (which will fix the bug) will remove these, replace all calls to xget_type_code() by calls to get_type_code(), and replace any occurrencies of {WRITE|UPDATE|DELETE}_ROWS_EVENT by PRE_GA_{WRITE|UPDATE|DELETE}_ROWS_EVENT. ---------- m_after_image, m_memory ---------- In each of {Update|Delete}_rows_log_event_old: uchar *m_after_image, *m_memory; No action needed (only in _old classes).
[31 Oct 2007 10:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36741 ChangeSet@1.2586, 2007-10-31 11:43:35+01:00, sven@murkla.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting Made the event_type argument for Old_rows_log_event(char *...) constructor be PRE_GA_*_ROWS_EVENT instead of *_ROWS_EVENT (this fixes the bug). Also removed xget_type_code() and used get_type_code() instead (the x version was introduced in previous patch just to make it compile).
[31 Oct 2007 12:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36750 ChangeSet@1.2587, 2007-10-31 13:29:49+01:00, sven@murkla.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting - Improved code formatting. (Even if that this changes many lines of the file, it should be safe against conflicts because the code was introduced in my previous patch, so noone else edited it.) - Added comments to header file explaining what the file contains and what the classes do. - Added assert(0) to code that should not be reached.
[31 Oct 2007 13:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36753 ChangeSet@1.2584, 2007-10-31 13:59:41+01:00, sven@murkla.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting - Merged Rows_log_event into Old_rows_log_event and {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old. - Fixed the bug by replacing {WRITE|UPDATE|DELETE}_ROWS_EVENT by PRE_GA_{WRITE|UPDATE|DELETE}_ROWS_EVENT. - Added comments to log_event_old.h This is a collapse of three changesets; see the bug report for details.
[19 Nov 2007 17:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/38082 ChangeSet@1.2618, 2007-11-19 17:27:34+01:00, sven@riska.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting - Merged Rows_log_event into Old_rows_log_event and {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old. - Fixed the bug by replacing {WRITE|UPDATE|DELETE}_ROWS_EVENT by PRE_GA_{WRITE|UPDATE|DELETE}_ROWS_EVENT. - Added comments to log_event_old.h This patch is identical to the previously committed patch which was a collapse of three changesets, except that it adds assert(0) to constructors for old types of row log events that should never be called.
[19 Nov 2007 17:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/38083 ChangeSet@1.2618, 2007-11-19 17:27:34+01:00, sven@riska.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting - Merged Rows_log_event into Old_rows_log_event and {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old. - Fixed the bug by replacing {WRITE|UPDATE|DELETE}_ROWS_EVENT by PRE_GA_{WRITE|UPDATE|DELETE}_ROWS_EVENT. - Added comments to log_event_old.h This patch is identical to the previously committed patch which was a collapse of three changesets, except that it adds assert(0) to constructors for old types of row log events that should never be called.
[19 Nov 2007 17:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/38084 ChangeSet@1.2618, 2007-11-19 17:27:34+01:00, sven@riska.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting - Merged Rows_log_event into Old_rows_log_event and {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old. - Fixed the bug by replacing {WRITE|UPDATE|DELETE}_ROWS_EVENT by PRE_GA_{WRITE|UPDATE|DELETE}_ROWS_EVENT. - Added comments to log_event_old.h This patch is identical to the previously committed patch which was a collapse of three changesets, except that it adds assert(0) to constructors for old types of row log events that should never be called.
[20 Nov 2007 18:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/38160 ChangeSet@1.2618, 2007-11-20 19:49:35+01:00, sven@riska.(none) +2 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting This patch has two purposes: (1) To refactor the code so that {Write|Update|Delete}_rows_log_event_old does not use code from {Write|Update|Delete}_rows_log_event. Before refactoring there was the following problem: whenever we modifed the code for new events, it affected the old events. This is bad, as it makes maintainance difficult. After refactoring, we can safely edit the new code without affecting old events. So, if we for instance modify the binary format of new events, we no longer need to worry about how the new code reads old events. (2) To fix BUG#31581. These two objectives are reached by the following changes: - Merged Rows_log_event into Old_rows_log_event and {Write|Update|Delete}_rows_log_event into {Write|Update|Delete}_rows_log_event_old. - Fixed the bug by replacing {WRITE|UPDATE|DELETE}_ROWS_EVENT by PRE_GA_{WRITE|UPDATE|DELETE}_ROWS_EVENT. - Added comments to log_event_old.h (This patch is identical to the previously committed patch which was a collapse of three changesets, except that it adds assert(0) to constructors for old types of row log events that should never be called.)
[21 Nov 2007 15:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/38214 ChangeSet@1.2618, 2007-11-21 16:53:46+01:00, sven@riska.(none) +1 -0 BUG#31581: 5.1-telco-6.1 -> 5.1.22. Slave crashes during starting The patch I previously pushed for this bug did not compile because a field in class THD had been renamed. This patch renames thd->query_error to thd->is_slave_error in log_event_old.cc by applying the same patch to log_event_old.cc as was previously applied to log_event.cc.
[5 Feb 2008 13:03]
Bugs System
Pushed into 5.1.24-rc
[5 Feb 2008 13:07]
Bugs System
Pushed into 6.0.5-alpha
[6 Feb 2008 15:56]
Jon Stephens
Documented bugfix in the 5.1.24 and 6.0.5 changelogs as follows: Replicating write, update, or delete events from a master running MySQL 5.1.15 or earlier to a slave running 5.1.16 or later caused the slave to crash.
[6 Mar 2008 9:52]
Jon Stephens
Also documented for 5.1.23-ndb-6.2.14.
[30 Mar 2008 20:22]
Jon Stephens
Also documented fix for 5.1.23-ndb-6.3.11.