Bug #49481 RBR: MyISAM and bit fields may cause slave to stop on delete: cant find record
Submitted: 6 Dec 2009 2:03 Modified: 15 Mar 2010 4:58
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.1, next-mr OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Tags: myisam, rbr field null
Triage: Triaged: D2 (Serious)

[6 Dec 2009 2:03] Luis Soares
Description:
Replication in mixed and RBR modes can sometimes break when
MyISAM tables with bit fields are in use.

I found this in the following scenario:

  S1> CREATE TABLE t1 (bit b, a int);
  S2> INSERT INTO t1 VALUES (1, 1);
  S3> UPDATE t1 SET b=NULL where a=1;

In S3 we change the bit field to be NULL instead of being
set. Then, if we later try to delete the row from the table (mind
the LIMIT 1, which causes switch to ROW in mixed mode):

  S4> DELETE FROM t1 WHERE c2=1 LIMIT 1;

The slave will stop, stating that it can't find the record to be
deleted:

Last_SQL_Error	Could not execute Delete_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 537

NOTES
=====

1. I found this in:

  - tree: mysql-5.1-bugteam
  - revid: davi.arnaut@sun.com-20091204153658-09w7l2edgszt8qe7

2. Running the test case (see how to repeat) on mtr, reports MySQL version:

  - MySQL Version 5.1.42

3. This seems to happen with MyISAM only.

How to repeat:
1. bzr clone -r revid:davi.arnaut@sun.com-20091204153658-09w7l2edgszt8qe7
$YOUR_LOCAL_MYSQL_REPO/mysql-5.1-bugteam
2. cd mysql-5.1-bugteam
3. ./BUILD/compile-pentium64-debug-max
4. cd mysql-test
5. create a test case with the following content:

-- source include/master-slave.inc
-- source include/have_binlog_format_mixed_or_row.inc

-- connection master
CREATE TABLE t1 (c1 BIT, c2 INT);
INSERT INTO `t1` VALUES ( 1, 1 );
UPDATE t1 SET c1=NULL where c2=1;
-- sync_slave_with_master

-- let $diff_table_1=master:test.t1
-- let $diff_table_2=slave:test.t1
-- source include/diff_tables.inc

-- connection master
DELETE FROM t1 WHERE c2=1 LIMIT 1;
-- sync_slave_with_master

-- let $diff_table_1=master:test.t1
-- let $diff_table_2=slave:test.t1
-- source include/diff_tables.inc

-- connection master
DROP TABLE t1; 
-- sync_slave_with_master

-- exit

6. ./mtr --mysqld=--default-storage-engine=myisam <your_test_file_name>
   
    This will cause the test case to fail with the reported error.

If I run (note innodb as default storage engine):

   ./mtr --mysqld=--default-storage-engine=innodb <your_test_file_name>

The test succeeds.

Suggested fix:
n/a
[6 Dec 2009 3:10] Luis Soares
MyISAM seems to store part of the BIT fields in the null
bits. This can raise problems when we are logging in row mode and
a MyISAM table is used. Consider the scenario presented in the
initial report (S# means statement number #):

  S1> CREATE TABLE t1 (c1 BIT, c2 INT);
  S2> INSERT INTO t1 VALUES (1, 1);
  S3> UPDATE t1 SET c1=NULL where c2=1;
  S4> DELETE FROM t1 WHERE c2=1 LIMIT 1;

Some notes:

  1. S4 will be logged as a Delete_rows_log_event.
  2. The null byte(s) of a MyISAM table contain part of a BIT
     field.
  3. Given, 2., we expect the bit field to be among the null
     byte(s).

Given that the DELETE is executed as a row event, the slave will:

  1. find the row

     This is done resorting to the Rows_log_event::find_row
     method. In it, the before image (BI) is unpacked and rows
     from the storage engine (SE) are fetched and
     compared. Fetching a row is done either by position (if a PK
     exists), by a key [scan] (if a key exists) or by a range
     scan.

     In both, range and key scan, a loop is performed and
     everytime a row is fetched from the SE it is compared
     against the row got from the BI. When a match is found, the
     slave successfully found the row to update/delete.

     Comparison can be done by a regular memcmp for the full
     rows (if no varchar or blob fields exist in the
     table). Otherwise, null bits are first matched with memcmp
     and if they match, a field by field comparison is then
     performed. The result of such comparisons determines the
     slave success on finding the desired row in the SE. This is
     done in the function log_event.cc:record_compare.

  2. if finding a row succeeded, the delete is performed by
     calling the handler ha_delete_row implementation.

Inspecting the slave execution (through a gdb session) during the
find_row method, I find, for the scenario in question, that the
values of the null byte are the following:

  unpacked before image: 0xF3  [11110011]
  row fetched from SE:   0XF7  [11110111]

Interpreting the bits (starting from the right - least
significant bits):

  1. X bit
  2. One null bit for field 'c1'
  3. One bit value for field 'c1'
  4. One null bit for field 'c2'
  5,6,7,8: filler bits

Note that there is a discrepancy in the null bits for the BI row
and the SE row. Although the third bit becomes a don't
care (stripped of meaning) when the null bit is set for that
field, it makes the row comparison to fail in find_row,
ultimately causing the slave to stop stating that it cannot find
the row to delete.
[6 Dec 2009 4:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92989

3247 Luis Soares	2009-12-06
      BUG#49481: RBR: MyISAM and bit fields may cause slave to stop on delete: 
      cant find record
      
      BUG#49482: RBR: Replication may break on deletes when MyISAM tables + 
      char field are used
      
      When using MyISAM tables, despite the fact that the null bit is
      set for some fields, their old value is still in the row. This
      can cause the comparison of records to fail when the slave is
      doing an index or range scan.
      
      We fix this by avoiding memcmp for MyISAM tables when comparing
      records. Additionally, when comparing field by field, we first
      check if both fields are not null and if so, then we compare
      them. If just one field is null we return failure immediately. If
      both fields are null, we move on to the next field.
[6 Dec 2009 4:57] Luis Soares
See also: BUG#49482.
[7 Dec 2009 13:16] Sveta Smirnova
Verified as described.
[14 Jan 2010 14:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/96903

3307 Luis Soares	2010-01-14
      Fix for BUG#49481 and BUG#49482.
      
      BUG#49481: RBR: MyISAM and bit fields may cause slave to stop on delete: 
      cant find record
            
      BUG#49482: RBR: Replication may break on deletes when MyISAM tables + 
      char field are used
      
      When using MyISAM tables, despite the fact that the null bit is
      set for some fields, their old value is still in the row. This
      can cause the comparison of records to fail when the slave is
      doing an index or range scan.
      
      We fix this by avoiding memcmp for MyISAM tables when comparing
      records. Additionally, when comparing field by field, we first
      check if both fields are not null and if so, then we compare
      them. If just one field is null we return failure immediately. If
      both fields are null, we move on to the next field.
[14 Jan 2010 15:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/96921

3829 Luis Soares	2010-01-14 [merge]
      manual merge: Fix for BUG#49481 and BUG#49482 (5.1-bugteam --> mysql-pe).
[20 Jan 2010 0:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97472

3320 Luis Soares	2010-01-20
      BUG#49481: RBR: MyISAM and bit fields may cause slave to stop on delete: 
      cant find record
      
      Some engines return data for the record. Despite the fact that
      the null bit is set for some fields, their old value may still in
      the row. This can happen when unpacking an AI from the binlog on
      top of a previous record in which a field is set to NULL, which
      previously contained a value. Ultimately, this may cause the
      comparison of records to fail when the slave is doing an index or
      range scan.
      
      We fix this by deploying a call to reset() for each field that is
      set to null while unpacking a row from the binary log.
      Furthermore, we also add mixed mode test case to cover the
      scenario where updating and setting a field to null through a
      Query event and later searching it through a rows event will
      succeed.
      
      Finally, we also change the reset() method, from Field_bit class,
      so that it takes into account bits stored among the null bits and
      not only the ones stored in the record.
     @ sql/field.h
        Changed reset so that it also clears the bits
        among the null_bits for the Field_bit class.
     @ sql/rpl_record.cc
        Resetting field after setting it to null when unpacking
        row.
[20 Jan 2010 22:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97661

3320 Luis Soares	2010-01-20
      BUG#49481: RBR: MyISAM and bit fields may cause slave to stop on delete: 
      cant find record
      
      Some engines return data for the record. Despite the fact that
      the null bit is set for some fields, their old value may still in
      the row. This can happen when unpacking an AI from the binlog on
      top of a previous record in which a field is set to NULL, which
      previously contained a value. Ultimately, this may cause the
      comparison of records to fail when the slave is doing an index or
      range scan.
      
      We fix this by deploying a call to reset() for each field that is
      set to null while unpacking a row from the binary log.
      Furthermore, we also add mixed mode test case to cover the
      scenario where updating and setting a field to null through a
      Query event and later searching it through a rows event will
      succeed.
      
      Finally, we also change the reset() method, from Field_bit class,
      so that it takes into account bits stored among the null bits and
      not only the ones stored in the record.
     @ sql/field.h
        Changed reset so that it also clears the bits
        among the null_bits for the Field_bit class.
     @ sql/rpl_record.cc
        Resetting field after setting it to null when unpacking
        row.
[21 Jan 2010 17:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97762

3320 Luis Soares	2010-01-21
      BUG#49481: RBR: MyISAM and bit fields may cause slave to stop on delete: 
      cant find record
      
      Some engines return data for the record. Despite the fact that
      the null bit is set for some fields, their old value may still in
      the row. This can happen when unpacking an AI from the binlog on
      top of a previous record in which a field is set to NULL, which
      previously contained a value. Ultimately, this may cause the
      comparison of records to fail when the slave is doing an index or
      range scan.
      
      We fix this by deploying a call to reset() for each field that is
      set to null while unpacking a row from the binary log.
      Furthermore, we also add mixed mode test case to cover the
      scenario where updating and setting a field to null through a
      Query event and later searching it through a rows event will
      succeed.
      
      Finally, we also change the reset() method, from Field_bit class,
      so that it takes into account bits stored among the null bits and
      not only the ones stored in the record.
     @ mysql-test/suite/rpl/t/rpl_set_null_innodb.test
        InnoDB test.
     @ mysql-test/suite/rpl/t/rpl_set_null_myisam.test
        MyISAM test.
     @ mysql-test/suite/rpl_ndb/t/rpl_ndb_set_null.test
        NDB test.
     @ sql/field.h
        Changed reset so that it also clears the bits
        among the null_bits for the Field_bit class.
     @ sql/rpl_record.cc
        Resetting field after setting it to null when unpacking
        row.
[4 Feb 2010 10:20] Bugs System
Pushed into 5.1.44 (revid:joro@sun.com-20100204101444-2j32mhqroo0iiio6) (version source revid:luis.soares@sun.com-20100126085522-0cbr9g33pf7x1bei) (merge vers: 5.1.43) (pib:16)
[5 Feb 2010 11:49] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100130212638-ai8t5v3u6647p6d2) (pib:16)
[5 Feb 2010 11:55] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)
[5 Feb 2010 12:01] Bugs System
Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alik@sun.com-20100130191336-i53i9wx67n81ridm) (merge vers: 5.5.2-m2) (pib:16)
[10 Feb 2010 6:31] Jon Stephens
Documented bugfix in the 5.1.44, 5.5.2, and 6.0.14 changelogs, as follows:

      When using row-based replication, setting a BIT column of a MyISAM 
      table to NULL then trying to delete from the table caused the slave 
      to fail with the error -Can't find record in [table]-.

Closed.
[10 Feb 2010 6:51] Jon Stephens
Documented together with BUG#49482. Changelog entry updated:

      When using row-based replication, setting a BIT or CHAR column of a 
      MyISAM table to NULL then trying to delete from the table caused 
      the slave to fail with the error -Can't find record in [table]-.
[12 Mar 2010 14:15] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:31] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:47] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[15 Mar 2010 4:58] Jon Stephens
No additional changelog entries required. Returning to Closed state.