Bug #49482 RBR: Replication may break on deletes when MyISAM tables + char field are used
Submitted: 6 Dec 2009 4:05 Modified: 15 Mar 2010 4:59
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.1, next-mr OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Triage: Triaged: D2 (Serious)

[6 Dec 2009 4:05] Luis Soares
Description:
Replication in mixed and RBR modes can sometimes break when using
MyISAM tables with char, enum and possibly other similar fields.

I found this in the following scenario (S# statements are
executed on master):

 S1> CREATE TABLE t1 (c1 CHAR);
 S2> INSERT INTO t1 ( c1 ) VALUES ( 'w' ) ;
 S3> UPDATE t1 SET c1=NULL WHERE c1='w' LIMIT 2;
 S4> DELETE FROM t1 LIMIT 2;

In S3, we set the field to NULL. In S4 we delete the field from
the table. The LIMIT 2 clauses are there to force a switch to row
in MIXED mode.

When the slave tries to replay the Delete_rows_log_event
originated from S4, it will stop with:

Last_SQL_Error	Could not execute Delete_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 648

NOTES
=====

1. I found this in:

  - tree: mysql-5.1-bugteam
  - revid: davi.arnaut@sun.com-20091204153658-09w7l2edgszt8qe7

2. Running the test case (see how to repeat) on mtr, reports MySQL version:

  - MySQL Version 5.1.42

3. This seems to happen with MyISAM only.

How to repeat:
1. bzr clone -r revid:davi.arnaut@sun.com-20091204153658-09w7l2edgszt8qe7
$YOUR_LOCAL_MYSQL_REPO/mysql-5.1-bugteam
2. cd mysql-5.1-bugteam
3. ./BUILD/compile-pentium64-debug-max
4. cd mysql-test
5. create a test case with the following content:

-- source include/master-slave.inc
-- source include/have_binlog_format_mixed_or_row.inc

CREATE TABLE t1 (c1 CHAR);

INSERT INTO t1 ( c1 ) VALUES ( 'w' ) ;
-- echo # should trigger switch to row due to LIMIT
UPDATE t1 SET c1=NULL WHERE c1='w' LIMIT 2;
-- sync_slave_with_master

-- let $diff_table_1=master:test.t1
-- let $diff_table_2=slave:test.t1
-- source include/diff_tables.inc

-- connection master
DELETE FROM t1 LIMIT 2;
-- sync_slave_with_master

-- let $diff_table_1=master:test.t1
-- let $diff_table_2=slave:test.t1
-- source include/diff_tables.inc

-- connection master
DROP TABLE t1;
-- sync_slave_with_master
-- exit

6. ./mtr --mysqld=--default-storage-engine=myisam <your_test_file_name>

    This will cause the test case to fail with the reported error.

If I run (note innodb as default storage engine):

   ./mtr --mysqld=--default-storage-engine=innodb <your_test_file_name>

The test succeeds.

Suggested fix:
n/a
[6 Dec 2009 4:05] Luis Soares
MyISAM seems to be returning content in the field despite the
fact that it has the null bit set. This can cause problems to the
slave, when it is trying to find the row to delete.

In RBR, the delete row event execution is split into two stages:

  1. find the row

     This is done resorting to the Rows_log_event::find_row
     method. In it, the before image (BI) is unpacked and rows
     from the storage engine (SE) are fetched and
     compared. Fetching a row is done either by position (if a PK
     exists), by a key [scan] (if a key exists) or by a range
     scan.

     In both, range and key scan, a loop is performed and
     everytime a row is fetched from the SE it is compared
     against the row got from the BI. When an exact match is
     found, the slave has successfully found the row to
     update/delete.

     Comparison can be done by a regular memcmp for the full
     rows (if no varchar or blob fields exist in the
     table). Otherwise, null bits are first matched with memcmp
     and if they match, a field by field comparison is then
     performed. The result of such comparisons determines the
     slave success on finding the desired row in the SE. This is
     done in the function log_event.cc:record_compare.

  2. if finding a row succeeded, the delete is performed by
     calling the handler ha_delete_row implementation.

Inspecting the slave execution (through a gdb session) during the
find_row method, I find, for the scenario in question (see how to
repeat section), that although the null bits match, the field
value for the BI row and SE row are different. This will cause
the record_compare function to fail either on the full rows
memcmp, or in the field by field comparison (if the table has
additional varchar or blob fields). For the record, here are the
bytes (two - one null byte and one byte for the values) for the
BI and SE row by the time the slave tries to match both (in
find_row):

  BI: 0xFF 0x20
  SE: 0xFF 0x77

From these values, one can notice that:

   1. the null bytes are correct (are all set - filler, X and
      null bits - meaning that the null bit for the field is
      set).

   2. the byte for the CHAR field exhibits 0x20 (space in ASCII
      code) for the BI row and 0x77 ('w' in ASCII code).

So this means that although the null bit was set value was not
erased by the SE.
[6 Dec 2009 4:57] Luis Soares
See also: BUG#49481.
[7 Dec 2009 13:28] Sveta Smirnova
Thank you for the report.

Verified as described.
[11 Dec 2009 8:02] Mats Kindahl
This bug is a duplicate of BUG#49481 and is fixed by the patch for that bug.
[11 Dec 2009 9:56] Luis Soares
Moving back to verified.
The problems are related but have different root cause. Incidentally, the patch for BUG#49481 fixes this one as well.
[14 Jan 2010 14:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/96903

3307 Luis Soares	2010-01-14
      Fix for BUG#49481 and BUG#49482.
      
      BUG#49481: RBR: MyISAM and bit fields may cause slave to stop on delete: 
      cant find record
            
      BUG#49482: RBR: Replication may break on deletes when MyISAM tables + 
      char field are used
      
      When using MyISAM tables, despite the fact that the null bit is
      set for some fields, their old value is still in the row. This
      can cause the comparison of records to fail when the slave is
      doing an index or range scan.
      
      We fix this by avoiding memcmp for MyISAM tables when comparing
      records. Additionally, when comparing field by field, we first
      check if both fields are not null and if so, then we compare
      them. If just one field is null we return failure immediately. If
      both fields are null, we move on to the next field.
[14 Jan 2010 15:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/96921

3829 Luis Soares	2010-01-14 [merge]
      manual merge: Fix for BUG#49481 and BUG#49482 (5.1-bugteam --> mysql-pe).
[4 Feb 2010 10:18] Bugs System
Pushed into 5.1.44 (revid:joro@sun.com-20100204101444-2j32mhqroo0iiio6) (version source revid:luis.soares@sun.com-20100119001000-kbmui3d351o9xem0) (merge vers: 5.1.43) (pib:16)
[5 Feb 2010 11:46] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100127195551-hzccsc9m2o2ir1j0) (pib:16)
[5 Feb 2010 11:53] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)
[5 Feb 2010 11:58] Bugs System
Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alexey.kopytov@sun.com-20100123210923-lx4o1ettww9fdkqk) (merge vers: 5.5.2-m2) (pib:16)
[10 Feb 2010 6:49] Jon Stephens
Documented together with BUG#49481, qv.

Closed.
[12 Mar 2010 14:16] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:32] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:49] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[15 Mar 2010 4:59] Jon Stephens
No additional changelog entries required. Returning to Closed state.