Bug #49100 RBR: Unexpected behavior when AI contains no usable data for slave columns
Submitted: 25 Nov 2009 13:17 Modified: 29 Nov 2010 13:54
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:6.0-codebase, mysql-5.1-rpl-wl5092 OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Tags: RBR
Triage: Triaged: D2 (Serious)

[25 Nov 2009 13:17] Luis Soares
Description:
Problem Statement
=================

There is an inconsistent behavior in RBR when empty write sets
are generated in the event after image (AI). If the slave has no
usable data in the AI, then it exhibits unexpected behavior. This
issue is hidden in 5.1 GA and currently in mysql-next-mr because
these codebases always send the full before (BI) and after (AI)
images. 

However, that is not the case in mysql-6.0-codebase, where we
just transfer the necessary parts of the AI and BI from the
master into the slave, which triggers this bug. Furthermore, once
WL#5092 is pushed into next-mr, this bug will surface in there as
well (when the user specifies that he wants to use minimal row
images).

I'll split the description into three cases.

Case #1
-------

  Problem: both master and slaves have the same table defined
           with explicit default values, and master inserts an
           empty row (this results in empty AI).
   
           MASTER> CREATE TABLE t1 (c1 int DEFAULT 100);
           SLAVE> CREATE TABLE t1 (c1 int DEFAULT 100);
           MASTER> INSERT INTO t1 VALUES ();

           At this point when the slave handles the write_rows
           event with the empty write_set it will just skip
           it. This results in master having one extra row, with
           c1=100, whereas the slave has no row at all.

Case #2
-------

  Problem: master has extra columns in table and declares a PK
           over these extra columns. Then it inserts values for
           just the columns in the PK (this results in AI
           containing only data for master extra columns).

           MASTER> CREATE TABLE t1 (c1 int DEFAULT 100, c2 int, primary key(c2));
           SLAVE> CREAT TABLE t1 (c1 int DEFAULT 100);
           MASTER> INSERT INTO t1(c2) VALUES (1);           
           
           At this point slave will correctly exhibit the row
           with c1=100. However, if we have a third
           slave (SLAVE2) attached to SLAVE, in a chained
           replication scenario, SLAVE2 will not show row with
           c1=100 (in fact, for this case its table - t1 - will
           be empty).

Case #3
-------

  Problem: master has extra columns in table and declares a PK
           over these extra columns. Then it inserts values for
           all columns and right after it updates the extra
           column only, but references common columns in the
           WHERE clause (this causes the AI to contain data for
           some extra columns, and BI to contain data for both,
           some extra and some common columns). Furthermore,
           consider that we are in a chained replication
           scenario (MASTER -> SLAVE1 -> SLAVE2):

           MASTER> CREATE TABLE t1 (c1 int DEFAULT 100, c2 int, primary key(c2));
           SLAVE1> CREAT TABLE t1 (c1 int DEFAULT 100);
           SLAVE2> CREAT TABLE t1 (c1 int DEFAULT 100);
           MASTER> INSERT INTO t1(c2) VALUES (1,1);
           MASTER> UPDATE t1 SET c2=2 WHERE c1=1 and c2=1;

           At this point SLAVE2 will hit an assertion, in
           unpack_current_row:

           ASSERT_OR_RETURN_ERROR(m_curr_row < m_rows_end, HA_ERR_CORRUPT_EVENT);

NOTES
=====

 N1. In practice the bugs exists in 5.1 and next-mr but we cannot
     currently trigger the above cases. So we should do nothing
     wrt 5.1, should be fine as it is. However, we should fix
     this in next-mr, because WL#5092 - which implements partial
     BI and AI - is targeting that codebase. For this we can wait
     until WL#5092 is pushed or do it in parallel. Either way
     should be fine.

 N2. Currently visible effects exist only in 6.0 codebase. But,
     again, given N1, it will surface in next-mr eventually.

 N3. Find attached the files with the test case I used to
     reproduce this.

REFERENCES
==========

 - made bug visible: BUG#33055
 - partial images work: BUG#14068, WL#5092
 - WL for testing WL#5092: WL#5096

How to repeat:
- You will need to download rpl_chained.tar.bz2 (attached in
  the bug report). It is needed for step #5 below.

- I tested this in mysql-6.0-codebase revision id: 
  alik@sun.com-20091123115308-acsdo1inhcfl2oo8

STEPS
=====

1. bzr clone $LOCAL_BZR_REPO/mysql-6.0-codebase
2. cd mysql-6.0-codebase
3. ./BUILD/compile-...-debug-max
4. cd mysql-test
5. tar xfvj rpl_chained.tar.bz2
6. perl mysql-test-run.pl rpl_empty_write_set

Suggested fix:
Make slaves to work with empty write sets and write sets with no data for their own tables.
[25 Nov 2009 13:18] Luis Soares
Test case used for reproducing the problem described.

Attachment: rpl_chained.tar.bz2 (application/x-bzip, text), 1.28 KiB.

[25 Nov 2009 15:49] Luis Soares
Test case used for reproducing the problem described.

Attachment: rpl_chained.tar.gz (application/x-gzip, text), 1.28 KiB.

[25 Nov 2009 15:52] Luis Soares
File uploaded in "[25 Nov 14:18] Luís Soares" had wrong extension. Should be tar.gz. I have uploaded the same file with extension fixed.

Step #5 in the "how to repeat" section becomes:

 5. tar xfvz rpl_chained.tar.gz
[30 Nov 2009 12:12] Miguel Solorzano
Thank you for the bug report.
[29 Mar 2010 18:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/104578

3183 Luis Soares	2010-03-29
      BUG#49100: RBR: Unexpected behavior when AI contains no usable data 
                 for slave columns
      
      This is a (pretty complete) sketch of a patch for this bug
      built on top of:
      
        - mysql-5.1-rpl-wl5092 (for server changes)
          luis.soares@sun.com-20100308232552-s3cl2ej07ln63sau
      
        - WL#5096 tests (for test case facilities)
          http://lists.mysql.com/commits/104427 (needs this cset on
          top of revision aforementioned for mysql-5.1-rpl-wl5092).
[30 Mar 2010 12:52] Sven Sandberg
This affects mysqlbinlog --base64=decode-rows --verbose too:

######## Begin test case ########
--source include/master-slave.inc

SET @@session.binlog_row_image = MINIMAL;
CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10));

--sync_slave_with_master
SET @@global.binlog_row_image = MINIMAL;
--source include/stop_slave.inc
--source include/start_slave.inc

DROP TABLE t1;
CREATE TABLE t1 (a VARCHAR(10));

--connection master
INSERT INTO t1(a) VALUES ('a');
INSERT INTO t1(b) VALUES ('b');
--sync_slave_with_master

--let $datadir= `SELECT @@datadir`
--exec $MYSQL_BINLOG --force-if-open --verbose --base64=decode-rows $datadir/slave-bin.000001
######## END TEST CASE ########

The output from mysqlbinlog contains this:

# at 445
# at 488
#100330 15:49:49 server id 1  end_log_pos 488 	Table_map: `test`.`t1` mapped to number 23
#100330 15:49:49 server id 1  end_log_pos 520 	Write_rows: table id 23 flags: STMT_END_F
### INSERT INTO test.t1
### SET
###   @1='a'
# at 520
[...]
# at 639
# at 682
#100330 15:49:49 server id 1  end_log_pos 682 	Table_map: `test`.`t1` mapped to number 23
#100330 15:49:49 server id 1  end_log_pos 711 	Write_rows: table id 23 flags: STMT_END_F
# at 711
#100330 15:49:49 server id 1  end_log_pos 771 	Query	thread_id=3	exec_time=0	error_code=0

etc.

I think the row decoder should output something like "INSERT INTO t1 VALUES()" instead of nothing, for the last Write_rows_log_event.
[30 Mar 2010 23:11] Luis Soares
There are two more scenarios

  Case #4: The master contains extra columns. Then it later
           updates these extra columns. The table has a PK
           defined over the extra columns:

           MASTER> CREATE TABLE t1 (a INT, b INT, c INT PRIMARY KEY);
           SLAVE>  CREATE TABLE t1 (a INT);
           MASTER> INSERT INTO t1 VALUES (1,1,1);
           MASTER> UPDATE t1 SET b=2 WHERE c=1;
           sync_slave_with_master

           In this case the slave shall skip the event and do no
           operations on the slave table nor raise an error.

  Case #5: The master contains extra columns. Then it later
           updates these extra columns and some of the columns it
           shares with the slave. The table has a PK defined over
           the extra columns:

           MASTER> CREATE TABLE t1 (a INT, b INT, c INT PRIMARY KEY);
           SLAVE>  CREATE TABLE t1 (a INT);
           MASTER> INSERT INTO t1 VALUES (1,1,1);
           MASTER> UPDATE t1 SET b=2, a=100 WHERE c=1;
           sync_slave_with_master

           In this case the slave shall stop with an error:
           HA_ERR_END_OF_FILE.
[30 Mar 2010 23:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/104670

3186 Luis Soares	2010-03-31
      BUG#49100: RBR: Unexpected behavior when AI contains no usable
                 data for slave columns
      
      RBR was not fully compliant with empty before and after
      images (BI and AI) in row events. Several faulty scenarios
      surfaced during WL#5092 implementation, especially when MINIMAL
      images were used:
      
        S1. Inserting an empty row on the master would cause slave to
            skip the event (empty AI);
      
            Expected: slave(s) shall apply the insert and fill its
            table with default values as master did.
      
        S2. Master inserts values just for columns that slave doesn't
            have (unusable AI at the slave).
      
            Expected: slave(s) shall apply the insert and fill its
            table with default values as master did for columns that
            user did not provide data.
       
        S3. Master updates columns that only it knows about (unusable
            AI at the slave) but references columns that it shares with
            the slave (usable BI at the slave). This would trigger an
            assertion on a third slave in chained replication: m -> s1
            -> s2, s2 would abort.
      
            Expected: slave s1 shall not update any row and slave s2
            shall not assert.
      
        S4. Master updates and references columns that only it knows
            about (unusable AI and BI at the slave).
      
            Expected: slave(s) shall skip the event and not update its
            tables with any value.
      
        S5. Master references columns that only it knows
            about (unusable BI at the slave) and updates columns shared
            with the slave (usable AI at the slave). Slave must stop.
      
            Expected: slave(s) shall fail with HA_ERR_END_OF_FILE.
      
      We fix S1 to S5 by relaxing some asserts that were triggered on
      when processing empty images. Additionally, on the
      Update_rows_log_event::do_exec_row we deployed a check the
      verifies if there is any value for AI. If there is not, then the
      event is skipped (neither BI nor AI is are processed).
      
      For mysqlbinlog --verbose --base64=decode-rows we print out an
      INSERT INTO t VALUES () when a write rows event contains an
      empty AI.
     @ mysql-test/suite/rpl/r/rpl_row_img.result
        Updates to result files.
     @ mysql-test/suite/rpl/t/rpl_row_img.test
        Added test cases.
     @ sql/log_event.cc
        1. Added printout for INSERT INTO ... VALUES () when AI is empty for
           Write_rows_log_events.
        2. Changed assertion from 'm_curr_row < ...' to 'm_curr_row <= ...'
        3. Calculation of estimated rows for bulk insert is now conditional,
           and dependent of whether m_curr_row < m_rows_end or not.
        4. In the Update_rows_log_event, changed it to take into consideration
           if there are any values in AI. If there are not, then skip event.
     @ sql/log_event.h
        Changed assertion to from 'm_curr_row < ...' to ' m_curr_row <= ...'
     @ sql/rpl_record.cc
        Do not unpack if master did not send any bit set in the write_set.
[21 Apr 2010 0:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106164

3187 Luis Soares	2010-04-21
      BUG#49100: RBR: Unexpected behavior when AI contains no usable
                 data for slave columns
            
      RBR was not fully compliant with empty before and after
      images (BI and AI) in row events. Several faulty scenarios
      surfaced during WL#5092 implementation, especially when MINIMAL
      images were used:
            
        S1. Inserting an empty row on the master would cause slave to
            skip the event (empty AI);
            
            Problem: Master and slave would become out of sync.
      
            Expected: slave(s) shall apply the insert and fill its
            table with default values as master did.
            
        S2. Master inserts values just for columns that slave doesn't
            have (unusable AI at the slave).
            
            Problem: In a chained replication scenario (m -> s1 -> s2),
            m and s1 would be in sync, but s2 would not.
      
            Expected: slave(s) shall apply the insert and fill its
            table with default values as master did for columns that
            user did not provide data.
             
        S3. Master updates columns that only it knows about (unusable
            AI at the slave) but references columns that it shares with
            the slave (usable BI at the slave). This would trigger an
            assertion on a third slave in chained replication: m -> s1
            -> s2, s2 would abort.
      
            Problem: in chained replication scenario (m -> s1 -> s2),
            s2 would trigger an assertion while unpacking the row
            logged by s1.
            
            Expected: slave s1 shall not update any row and slave s2
            shall not assert.
            
        S4. Master updates and references columns that only it knows
            about (unusable AI and BI at the slave).
      
            Problem: although AI does not contain any data for updating
            the slave (update becomes no-op), the slave stops with
            error, because BI does not also contain relevant data for
            finding the record.
            
            Expected: slave(s) shall skip the event, thence not even
            trying to update its tables with any value.
            
        S5. Master references columns that only it knows
            about (unusable BI at the slave) and updates columns shared
            with the slave (usable AI at the slave). Slave must stop.
            
            Problem: none really. Just adding the test case for
            coverage purposes.
      
            Expected: slave(s) shall fail with HA_ERR_END_OF_FILE.
            
      We fix S1 to S3 by relaxing some asserts that were triggered on
      when processing empty images and by doing some changes to
      accomodate the fact that some images might be empty. 
      
      For S4, we deploy a check in Update_rows_log_event::do_exec_row
      that verifies if there is any value in the AI. If there is not,
      then the event is skipped (neither BI nor AI is are processed).
      
      As for S5, we just add the test case.
            
      For mysqlbinlog --verbose --base64=decode-rows we print out an
      INSERT INTO t VALUES () when a write rows event contains an empty
      AI.
     @ mysql-test/suite/rpl/r/rpl_row_img.result
        Updates to result files.
     @ mysql-test/suite/rpl/t/rpl_row_img.test
        Added test cases.
     @ sql/log_event.cc
        1. Added printout for INSERT INTO ... VALUES () when AI is empty for
           Write_rows_log_events.
        2. Changed assertion from 'm_curr_row < ...' to 'm_curr_row <= ...'
        3. Calculation of estimated rows for bulk insert is now conditional,
           and dependent of whether m_curr_row < m_rows_end or not.
        4. In the Update_rows_log_event, changed it to take into consideration
           if there are any values in AI. If there are not, then skip event.
     @ sql/log_event.h
        Changed assertion to from 'm_curr_row < ...' to ' m_curr_row <= ...'.
     @ sql/rpl_record.cc
        Do not unpack if master did not send any bit set in the write_set.
[4 May 2010 9:12] Luis Soares
Related: BUG#53386.
[10 May 2010 23:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107904

3188 Luis Soares	2010-05-11
      BUG#49100: RBR: Unexpected behavior when AI contains no usable data for 
                 slave columns
      
      Addressing review comments:
      
        1. Added NDB test case, even if it does not work (because of
           BUG#52473.
        2. Fixed typo '=' -> '=='
        3. Added missing parenthesis.
     @ mysql-test/extra/rpl_tests/rpl_row_empty_imgs.test
        Engine agnostic part of the test case.
     @ mysql-test/suite/rpl/t/rpl_row_img.test
        Moved agnostic part of the test case into extra/rpl_tests.
        Added tests for InnoDB as well.
     @ mysql-test/suite/rpl_ndb/t/disabled.def
        Added rpl_ndb_row_img (just added) test case. We add it so that
        it can be disabled after BUG#52473 is fixed.
     @ mysql-test/suite/rpl_ndb/t/rpl_ndb_row_img.cnf
        Configuration file for chained NDB replication: master -> slave1 -> slave2.
     @ mysql-test/suite/rpl_ndb/t/rpl_ndb_row_img.test
        Test case for NDB which just sets up NDB and sources the 
        engine agnostic part of the test.
     @ sql/log_event.cc
        Review comments: missing parenthesis and "dangerous" typo!
[26 May 2010 14:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109292

3189 Luis Soares	2010-05-26
      Fixing result file for rpl_row_img (wrong version had been
      checked in BUG#49100 patch).
[8 Oct 2010 13:29] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101008132832-pbzewvmi9f365ak4) (version source revid:alexander.nozdrin@oracle.com-20101008132832-pbzewvmi9f365ak4) (pib:21)
[13 Nov 2010 16:16] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[29 Nov 2010 13:54] Jon Stephens
Doesn't seem to appear in any release versions -- closing without further action at this time.