MySQL Bugs: #28430: Failure in replication of innodb partitioned tables on row/mixed format

Bug #28430	Failure in replication of innodb partitioned tables on row/mixed format
Submitted:	15 May 2007 1:56	Modified:	21 Jun 2010 0:42
Reporter:	Omer Barnir (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.1.19	OS:	Any
Assigned to:	Alexey Botchkov	CPU Architecture:	Any

Description:
Missing updates observed on the slave when replicating innodb partoiopmed 
tables using rbr/mbr modes.
Details:
Tables defined as:
CREATE TABLE test.bykey_tbl(id MEDIUMINT NOT NULL AUTO_INCREMENT,
                           dt TIMESTAMP, user CHAR(255), uuidf LONGBLOB,
                           fkid MEDIUMINT, filler VARCHAR(255),
                           PRIMARY KEY(id)) ENGINE='innodb'
                                PARTITION BY KEY(id) partitions 5;

CREATE TABLE test.byrange_tbl(id MEDIUMINT NOT NULL AUTO_INCREMENT,
                           dt TIMESTAMP, user CHAR(255), uuidf LONGBLOB,
                           fkid MEDIUMINT, filler VARCHAR(255),
                           PRIMARY KEY(id)) ENGINE='innodb'
                                PARTITION BY RANGE(id)
                                SUBPARTITION BY hash(id) subpartitions 2
                                (PARTITION pa1 values less than (10),
                                 PARTITION pa2 values less than (20),
                                 PARTITION pa3 values less than (30),
                                 PARTITION pa4 values less than (40),
                                 PARTITION pa5 values less than (50),
                                 PARTITION pa6 values less than (60),
                                 PARTITION pa7 values less than (70),
                                 PARTITION pa8 values less than (80),
                                 PARTITION pa9 values less than (90),
                                 PARTITION pa10 values less than (100),
                                 PARTITION pa11 values less than MAXVALUE);
The tables are populated using stored procedures that populate 1000 rows and then remove every second line (see attached test case) to the end result is 
that each table has 500 lines after the call to the storered procedure.

However on the slave only SOME of the rows are deleted from each of the tables
so te end result is that the slave tables have MORE rows than than are expected.

Note:
 - No replication errors are reported and replication is reported as completed
   successfuly
 - The problem is not observed with non partitioned tables (see test case)
 - the problem is not observed if using myisam
 - The problem is not observed if the slave database has 'innodb' disabled
   i.e. the tables are created on the slave as myisam.
 - To see the differences download the attached rpl_test_db.tar.gz file

How to repeat:
download the attached test and result file and copy them to '/t' and '/r'.

To see the problem as described run:
perl ./mysql-test-run.pl --mysqld="--innodb" rpl_part_bug       

To see the problem does not exist if innodb is disabled on the slave run 
perl ./mysql-test-run.pl rpl_part_bug       

To see that the problem is not observed with statement based replication run
perl ./mysql-test-run.pl --mysqld="--binlog_format=statement" \
                         --mysqld="--innodb" rpl_part_bug 

To see the problem is not observed with MyISAM change the value of the $engine_type variable in the test case

Suggested fix:
Replicated tables should be identincal

example of the diffrences between master and slave

Attachment: rpl_test_db.tar.gz (application/x-gunzip, text), 38.62 KiB.

Test case for the above

Attachment: rpl_part_bug.test (application/octet-stream, text), 4.88 KiB.

Expected results for the above test case

Attachment: rpl_part_bug.result (application/octet-stream, text), 5.40 KiB.

Note, the number of rows can be controlled by changing the value of 'ins_count' in the stored procedures. 
For the 'byrange' table the problem shows with a number as small as 10
For the 'bykey' table the number needs to be at least 200 for the problem to show.
I have also noticed that all the 'deleted' entries exist in the relay logs as expected.

Correcxtion/clarification to the above:
show status on the slave reports:
Error#: 1032
Error in Delete_rows event: error during transaction execution on table test.pinsdel1_tbl

*** Please Ignore Last comment from  [16 May 3:21] ***
show status on the slave reports:
Error#: 1032
Error in Delete_rows event: error during transaction execution on table
test.bykey_tbl

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28982

ChangeSet@1.2554, 2007-06-18 15:33:46+05:00, holyfoot@mysql.com +1 -0
  Bug #28430 Failiure in replication of innodb partitiitioned tables on row/mixed format.
  
  In the ha_partition::position() we didn't calculate the number
  of the partition of the record. We used m_last_part value
  instead, relying on that it is set in other place like
  previous call of ::write_row() or something.
  In replication, we do neither of these calls before ::position()
  Delete_rows_log_event::do_exec_row calls find_and_fetch_row
  there in the case of InnoDB partitions have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
  so use position() / rnd_pos() calls to fetch the record.
  Fixed by adding partition_id calculation to the ha_partition::position()

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29475

ChangeSet@1.2554, 2007-06-25 10:15:05+05:00, holyfoot@mysql.com +1 -0
  Bug #28430 Failiure in replication of innodb partitiitioned tables on row/mixed format.
  
  In the ha_partition::position() we didn't calculate the number
  of the partition of the record. We used m_last_part value
  instead, relying on that it is set in other place like
  previous call of ::write_row() or something.
  In replication, we do neither of these calls before ::position()
  Delete_rows_log_event::do_exec_row calls find_and_fetch_row
  there in the case of InnoDB partitions have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
  so use position() / rnd_pos() calls to fetch the record.
  Fixed by adding partition_id calculation to the ha_partition::position()

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29589

ChangeSet@1.2514, 2007-06-26 15:33:57+05:00, holyfoot@mysql.com +3 -0
  Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format.
  
  In the ha_partition::position we don't calculate the number of
  the partition of the record. We use m_last_part_value instead relying on
  that it is set in other place like previous calls of ::write_row().
  In replication we do neither of these calls before ::position().
  Delete_row_log_event::do_exec_row calls find_and_fetch_row() where
  we used position() & rnd_pos() calls to find the record for the
  PARTITION/INNODB table as it posesses InnoDB table flags.
  Fixed by removing HA_PRIMARY_KEY_REQUIRED_FOR_POSITION flag from PARTITION

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29590

ChangeSet@1.2514, 2007-06-26 15:36:38+05:00, holyfoot@mysql.com +3 -0
  Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format.
  
  In the ha_partition::position we don't calculate the number of
  the partition of the record. We use m_last_part_value instead relying on
  that it is set in other place like previous calls of ::write_row().
  In replication we do neither of these calls before ::position().
  Delete_row_log_event::do_exec_row calls find_and_fetch_row() where
  we used position() & rnd_pos() calls to find the record for the
  PARTITION/INNODB table as it posesses InnoDB table flags.
  Fixed by removing HA_PRIMARY_KEY_REQUIRED_FOR_POSITION flag from PARTITION

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29851

ChangeSet@1.2518, 2007-06-28 16:32:43+05:00, holyfoot@mysql.com +3 -0
  Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format.
  
  In the ha_partition::position() we didn't calculate the number
  of the partition of the record. We used m_last_part value instead,
  relying on that it is set in other place like previous call of a method
  like ::write_row(). In replication we don't call any of these befor
  position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row.
  In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
  enabled, so use position() / rnd_pos() calls to fetch the record.
  
  Fixed by adding partition_id calculation to the ha_partition::position()

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29852

ChangeSet@1.2518, 2007-06-28 16:35:21+05:00, holyfoot@mysql.com +3 -0
  Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format.
  
  In the ha_partition::position() we didn't calculate the number
  of the partition of the record. We used m_last_part value instead,
  relying on that it is set in other place like previous call of a method
  like ::write_row(). In replication we don't call any of these befor
  position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row.
  In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
  enabled, so use position() / rnd_pos() calls to fetch the record.
  
  Fixed by adding partition_id calculation to the ha_partition::position()

Pushed into 5.1.21-beta

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix in 5.1.21 changelog.

The problem is still observed with 5.1.21. Re opening the bug

Given that this one comes back as a regression I would prefer if a new solution is tried
where the position call actually can trust that a read was done before and that therefore
the partition id is set already.

The only exception to the semantics of the position call is this call from replication. So I
would prefer if the replication code uses a separate handler call that is normally implemented
as position but for the partition handler translates into the new position call as the patch used.
I don't really like the idea of changing the semantics of the position call.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32992

ChangeSet@1.2571, 2007-08-23 23:34:48+05:00, holyfoot@mysql.com +3 -0
  Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format.
  
  In the ha_partition::position() we didn't calculate the number
  of the partition of the record. We used m_last_part value instead,
  relying on that it is set in other place like previous call of a method
  like ::write_row(). In replication we don't call any of these befor
  position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row.
  In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
  enabled, so use position() / rnd_pos() calls to fetch the record.
  
  Fixed by adding partition_id calculation to the ha_partition::position()

The patch is from what I can see exactly the same patch as before.
Also I don't really think it's a good idea to touch column_bitmap_signal,
this is called many times in a query.

Did you read my last comment?

Pushed into 5.1.22-beta

Hello, is this bug actually fixed or not? Thanks!

Jon,
Since it is pushed it is fixed, but it is also reopened to fix it in another way.
Much like bug#18198 which was reopened 3 times :)

The bug is still not fixed and now the scenario described in the bug report causes the slave to crash at the point the where the test case is waiting for the slave to be sync'd with the master.

error displayed on the screen is:
mysqltest: At line 151: failed in 'select master_pos_wait('master-bin.000001', 945649)': 2006: MySQL server has gone away

Note line 151 is the executuion of "-sync_slave_with_master" command

The slave mysqld process is grashing after getting a signal 6 follwing InnoDB getting an "Assertion failure in thread 1106172848 in file row/row0sel.c line 4599"

With backtrace:
0x8222ab3
0xffffe410
0x400eedbb
0x846a415
0x83de537
0x83de77d
0x83de8d4
0x830ec51
0x82dc38d
0x834fa6f
0x40032297
0x4018437e

(see attached slave.err file for more information)

Resolving the stack trace includes the following information:
0x8222ab3 handle_segfault + 819
0xffffe410 _end + -142709184
0x400eedbb _end + 932018155
0x846a415 row_search_max_autoinc + 917
0x83de537 _ZN11ha_innobase31innobase_read_and_init_auto_incEPx + 247
0x83de77d _ZN11ha_innobase27innobase_get_auto_incrementEPy + 237
0x83de8d4 _ZN11ha_innobase10delete_rowEPKh + 260
0x830ec51 _ZN7handler13ha_delete_rowEPKh + 33
0x82dc38d _ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info + 701
0x834fa6f handle_slave_sql + 1727
0x40032297 _end + 931245255
0x4018437e _end + 932629934

Resolving the stack trace

mysqld slave.err file including crash information

Attachment: slave.err (application/octet-stream, text), 3.81 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33912

ChangeSet@1.2586, 2007-09-07 18:41:49+05:00, holyfoot@mysql.com +4 -0
  Bug #28430 Failure in replication of innodb partitioned tables on row/mixed format.
  
  In the ha_partition::position() we don't calculate the number
  of the partition of the record, but use m_last_part value instead,
  relying on that it's previously set by some other call like ::write_row().
  Delete_rows_log_event::do_exec_row() calls find_and_fetch_row(),
  where we used position() + rnd_pos() call for the InnoDB-based PARTITION-ed
  table as there HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled.
  
  fixed by introducing new handler::rnd_pos_by_record() method to be
  used for random record-based positioning

hf has isolated the crash problem as un related to partitions but to deleting rows (see bug=30888). Also while investigating the issue found 30907 relating to inserts into innodb partitioned tables when using --innodb_autoinc_lock_mode=0

I like this second solution much better

Noted in 5.1.22 changelog.

Replication of InnoDB partitioned tables could lose updates with
row-based or mixed replication format.

Report remains in Patch queued status.

Pushed into 5.1.23-beta

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Already documented; closed after discussing with Paul.

Pushed into 5.1.23-beta

Moved bugfix report to 5.1.23 changelog, given failure was still noted in 5.1.22 with subsequent pushes to 5.1.23.

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)