Bug #28430 | Failure in replication of innodb partitioned tables on row/mixed format | ||
---|---|---|---|
Submitted: | 15 May 2007 1:56 | Modified: | 21 Jun 2010 0:42 |
Reporter: | Omer Barnir (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.19 | OS: | Any |
Assigned to: | Alexey Botchkov | CPU Architecture: | Any |
[15 May 2007 1:56]
Omer Barnir
[15 May 2007 1:59]
Omer Barnir
example of the diffrences between master and slave
Attachment: rpl_test_db.tar.gz (application/x-gunzip, text), 38.62 KiB.
[15 May 2007 2:00]
Omer Barnir
Test case for the above
Attachment: rpl_part_bug.test (application/octet-stream, text), 4.88 KiB.
[15 May 2007 2:02]
Omer Barnir
Expected results for the above test case
Attachment: rpl_part_bug.result (application/octet-stream, text), 5.40 KiB.
[15 May 2007 13:31]
Omer Barnir
Note, the number of rows can be controlled by changing the value of 'ins_count' in the stored procedures. For the 'byrange' table the problem shows with a number as small as 10 For the 'bykey' table the number needs to be at least 200 for the problem to show. I have also noticed that all the 'deleted' entries exist in the relay logs as expected.
[16 May 2007 1:21]
Omer Barnir
Correcxtion/clarification to the above: show status on the slave reports: Error#: 1032 Error in Delete_rows event: error during transaction execution on table test.pinsdel1_tbl
[16 May 2007 1:24]
Omer Barnir
*** Please Ignore Last comment from [16 May 3:21] *** show status on the slave reports: Error#: 1032 Error in Delete_rows event: error during transaction execution on table test.bykey_tbl
[18 Jun 2007 11:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28982 ChangeSet@1.2554, 2007-06-18 15:33:46+05:00, holyfoot@mysql.com +1 -0 Bug #28430 Failiure in replication of innodb partitiitioned tables on row/mixed format. In the ha_partition::position() we didn't calculate the number of the partition of the record. We used m_last_part value instead, relying on that it is set in other place like previous call of ::write_row() or something. In replication, we do neither of these calls before ::position() Delete_rows_log_event::do_exec_row calls find_and_fetch_row there in the case of InnoDB partitions have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION so use position() / rnd_pos() calls to fetch the record. Fixed by adding partition_id calculation to the ha_partition::position()
[25 Jun 2007 6:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29475 ChangeSet@1.2554, 2007-06-25 10:15:05+05:00, holyfoot@mysql.com +1 -0 Bug #28430 Failiure in replication of innodb partitiitioned tables on row/mixed format. In the ha_partition::position() we didn't calculate the number of the partition of the record. We used m_last_part value instead, relying on that it is set in other place like previous call of ::write_row() or something. In replication, we do neither of these calls before ::position() Delete_rows_log_event::do_exec_row calls find_and_fetch_row there in the case of InnoDB partitions have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION so use position() / rnd_pos() calls to fetch the record. Fixed by adding partition_id calculation to the ha_partition::position()
[26 Jun 2007 12:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29589 ChangeSet@1.2514, 2007-06-26 15:33:57+05:00, holyfoot@mysql.com +3 -0 Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format. In the ha_partition::position we don't calculate the number of the partition of the record. We use m_last_part_value instead relying on that it is set in other place like previous calls of ::write_row(). In replication we do neither of these calls before ::position(). Delete_row_log_event::do_exec_row calls find_and_fetch_row() where we used position() & rnd_pos() calls to find the record for the PARTITION/INNODB table as it posesses InnoDB table flags. Fixed by removing HA_PRIMARY_KEY_REQUIRED_FOR_POSITION flag from PARTITION
[26 Jun 2007 12:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29590 ChangeSet@1.2514, 2007-06-26 15:36:38+05:00, holyfoot@mysql.com +3 -0 Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format. In the ha_partition::position we don't calculate the number of the partition of the record. We use m_last_part_value instead relying on that it is set in other place like previous calls of ::write_row(). In replication we do neither of these calls before ::position(). Delete_row_log_event::do_exec_row calls find_and_fetch_row() where we used position() & rnd_pos() calls to find the record for the PARTITION/INNODB table as it posesses InnoDB table flags. Fixed by removing HA_PRIMARY_KEY_REQUIRED_FOR_POSITION flag from PARTITION
[28 Jun 2007 12:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29851 ChangeSet@1.2518, 2007-06-28 16:32:43+05:00, holyfoot@mysql.com +3 -0 Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format. In the ha_partition::position() we didn't calculate the number of the partition of the record. We used m_last_part value instead, relying on that it is set in other place like previous call of a method like ::write_row(). In replication we don't call any of these befor position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row. In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled, so use position() / rnd_pos() calls to fetch the record. Fixed by adding partition_id calculation to the ha_partition::position()
[28 Jun 2007 12:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29852 ChangeSet@1.2518, 2007-06-28 16:35:21+05:00, holyfoot@mysql.com +3 -0 Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format. In the ha_partition::position() we didn't calculate the number of the partition of the record. We used m_last_part value instead, relying on that it is set in other place like previous call of a method like ::write_row(). In replication we don't call any of these befor position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row. In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled, so use position() / rnd_pos() calls to fetch the record. Fixed by adding partition_id calculation to the ha_partition::position()
[1 Jul 2007 19:57]
Bugs System
Pushed into 5.1.21-beta
[5 Jul 2007 13:46]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html Documented bugfix in 5.1.21 changelog.
[14 Aug 2007 14:04]
Omer Barnir
The problem is still observed with 5.1.21. Re opening the bug
[23 Aug 2007 10:10]
Mikael Ronström
Given that this one comes back as a regression I would prefer if a new solution is tried where the position call actually can trust that a read was done before and that therefore the partition id is set already. The only exception to the semantics of the position call is this call from replication. So I would prefer if the replication code uses a separate handler call that is normally implemented as position but for the partition handler translates into the new position call as the patch used. I don't really like the idea of changing the semantics of the position call.
[23 Aug 2007 19:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32992 ChangeSet@1.2571, 2007-08-23 23:34:48+05:00, holyfoot@mysql.com +3 -0 Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format. In the ha_partition::position() we didn't calculate the number of the partition of the record. We used m_last_part value instead, relying on that it is set in other place like previous call of a method like ::write_row(). In replication we don't call any of these befor position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row. In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled, so use position() / rnd_pos() calls to fetch the record. Fixed by adding partition_id calculation to the ha_partition::position()
[23 Aug 2007 20:36]
Mikael Ronström
The patch is from what I can see exactly the same patch as before. Also I don't really think it's a good idea to touch column_bitmap_signal, this is called many times in a query. Did you read my last comment?
[24 Aug 2007 7:20]
Bugs System
Pushed into 5.1.22-beta
[24 Aug 2007 9:32]
Jon Stephens
Hello, is this bug actually fixed or not? Thanks!
[24 Aug 2007 10:09]
Mikael Ronström
Jon, Since it is pushed it is fixed, but it is also reopened to fix it in another way. Much like bug#18198 which was reopened 3 times :)
[5 Sep 2007 0:50]
Omer Barnir
The bug is still not fixed and now the scenario described in the bug report causes the slave to crash at the point the where the test case is waiting for the slave to be sync'd with the master. error displayed on the screen is: mysqltest: At line 151: failed in 'select master_pos_wait('master-bin.000001', 945649)': 2006: MySQL server has gone away Note line 151 is the executuion of "-sync_slave_with_master" command The slave mysqld process is grashing after getting a signal 6 follwing InnoDB getting an "Assertion failure in thread 1106172848 in file row/row0sel.c line 4599" With backtrace: 0x8222ab3 0xffffe410 0x400eedbb 0x846a415 0x83de537 0x83de77d 0x83de8d4 0x830ec51 0x82dc38d 0x834fa6f 0x40032297 0x4018437e (see attached slave.err file for more information) Resolving the stack trace includes the following information: 0x8222ab3 handle_segfault + 819 0xffffe410 _end + -142709184 0x400eedbb _end + 932018155 0x846a415 row_search_max_autoinc + 917 0x83de537 _ZN11ha_innobase31innobase_read_and_init_auto_incEPx + 247 0x83de77d _ZN11ha_innobase27innobase_get_auto_incrementEPy + 237 0x83de8d4 _ZN11ha_innobase10delete_rowEPKh + 260 0x830ec51 _ZN7handler13ha_delete_rowEPKh + 33 0x82dc38d _ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info + 701 0x834fa6f handle_slave_sql + 1727 0x40032297 _end + 931245255 0x4018437e _end + 932629934 Resolving the stack trace
[5 Sep 2007 0:52]
Omer Barnir
mysqld slave.err file including crash information
Attachment: slave.err (application/octet-stream, text), 3.81 KiB.
[7 Sep 2007 14:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33912 ChangeSet@1.2586, 2007-09-07 18:41:49+05:00, holyfoot@mysql.com +4 -0 Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format. In the ha_partition::position() we don't calculate the number of the partition of the record, but use m_last_part value instead, relying on that it's previously set by some other call like ::write_row(). Delete_rows_log_event::do_exec_row() calls find_and_fetch_row(), where we used position() + rnd_pos() call for the InnoDB-based PARTITION-ed table as there HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled. fixed by introducing new handler::rnd_pos_by_record() method to be used for random record-based positioning
[7 Sep 2007 17:35]
Omer Barnir
hf has isolated the crash problem as un related to partitions but to deleting rows (see bug=30888). Also while investigating the issue found 30907 relating to inserts into innodb partitioned tables when using --innodb_autoinc_lock_mode=0
[8 Sep 2007 9:18]
Mikael Ronström
I like this second solution much better
[12 Sep 2007 17:18]
Paul DuBois
Noted in 5.1.22 changelog. Replication of InnoDB partitioned tables could lose updates with row-based or mixed replication format. Report remains in Patch queued status.
[14 Sep 2007 7:45]
Bugs System
Pushed into 5.1.23-beta
[20 Sep 2007 7:16]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html Already documented; closed after discussing with Paul.
[5 Oct 2007 17:57]
Bugs System
Pushed into 5.1.23-beta
[20 Nov 2007 7:46]
Jon Stephens
Moved bugfix report to 5.1.23 changelog, given failure was still noted in 5.1.22 with subsequent pushes to 5.1.23.
[5 May 2010 15:09]
Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 2:58]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 6:10]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:38]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 7:06]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 2:58]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[17 Jun 2010 12:15]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:02]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:43]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)