Bug #53893 RBR: nullable unique key can lead to out-of-sync slave
Submitted: 21 May 2010 17:44 Modified: 14 Oct 2010 12:32
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.1.48-bzr OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Tags: INDEX, RBR, unique key
Triage: Triaged: D2 (Serious)

[21 May 2010 17:44] Luis Soares
Description:
If using nullable fields with unique keys the slave can go out of
sync because multiple fields with NULL as value in the unique key
field(s) are allowed. In such scenarios, when the slave searches
for a row, it will take the first row with NULL value for the
unique key and will update that row.

This means that we can have the following:

MASTER> CREATE TABLE t1 (c1 INT, c2 INT, UNIQUE KEY(c1));
MASTER> INSERT INTO t1(c2) VALUES (1), (2), (3);
MASTER> UPDATE t1 SET c2=1000 WHERE c2=3;
MASTER> SELECT * FROM t1;
c1      c2
NULL    1
NULL    2
NULL    1000

-- sync_slave_with_master

SLAVE> SELECT * FROM t1;
c1      c2
NULL    1000
NULL    2
NULL    3

How to repeat:
1. bzr clone -r revid:sven.sandberg@sun.com-20100520153801-yyhujm1qqa4eyfn0 BZR_REPO/mysql-5.1-bugteam
2. cd mysql-5.1-bugteam
3. ./BUIL/compile-pentium64-debug-max
4. cd mysql-test
5. vi suite/rpl/t/rpl_row_unni.test

-- source include/master-slave.inc
-- source include/have_binlog_format_row.inc

CREATE TABLE t1 (c1 INT, c2 INT, UNIQUE KEY(c1));
INSERT INTO t1(c2) VALUES (1), (2), (3);
UPDATE t1 SET c2=1000 WHERE c2=3;
SELECT * FROM t1; 
-- sync_slave_with_master
SELECT * FROM t1; 

-- exit

6. perl mtr rpl_row_unni
7. Watch diff tables show differences.

Suggested fix:
In Rows_log_event::find_row, take into account whether the unique key is 
contains nullable parts or not. If it does, then one needs to do an index
scan.

An optimization would be even if it contains nullable parts, check if the
before image (BI) contains null fields for the unique key parts. If it 
doesn't then we may stop searching and take that row.
[21 May 2010 17:53] Valeriy Kravchuk
Verified just as described with recent mysql-5.1 tree from bzr on Mac OS X:

valeriy-kravchuks-macbook-pro:mysql-test openxs$ ./mtr rpl_row_unni
Logging: ./mtr  rpl_row_unni
100521 20:50:56 [Warning] Setting lower_case_table_names=2 because file system for /var/folders/dX/dXCzvuSlHX4Op1g-o1jIWk+++TI/-Tmp-/7jzksbK4Lv/ is case insensitive
100521 20:50:56 [Note] Plugin 'FEDERATED' is disabled.
100521 20:50:56 [Note] Plugin 'ndbcluster' is disabled.
MySQL Version 5.1.48
Checking supported features...
 - using ndbcluster when necessary, mysqld supports it
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
 - adding combinations for rpl
vardir: /Users/openxs/dbs/5.1/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/Users/openxs/dbs/5.1/mysql-test/var'...
Installing system database...
Using server port 60634

==============================================================================

TEST                                      RESULT   TIME (ms)
------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
rpl.rpl_row_unni 'stmt'                  [ skipped ]  Doesn't support --binlog-format='statement'
rpl.rpl_row_unni 'mix'                   [ skipped ]  Doesn't support --binlog-format='mixed'
rpl.rpl_row_unni 'row'                   [ fail ]
        Test ended at 2010-05-21 20:51:05

CURRENT_TEST: rpl.rpl_row_unni
--- /Users/openxs/dbs/5.1/mysql-test/suite/rpl/r/rpl_row_unni.result	2010-05-21 20:50:35.000000000 +0300
+++ /Users/openxs/dbs/5.1/mysql-test/suite/rpl/r/rpl_row_unni.reject	2010-05-21 20:51:05.000000000 +0300
@@ -0,0 +1,19 @@
+stop slave;
+drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
+reset master;
+reset slave;
+drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
+start slave;
+CREATE TABLE t1 (c1 INT, c2 INT, UNIQUE KEY(c1));
+INSERT INTO t1(c2) VALUES (1), (2), (3);
+UPDATE t1 SET c2=1000 WHERE c2=3;
+SELECT * FROM t1;
+c1	c2
+NULL	1
+NULL	2
+NULL	1000
+SELECT * FROM t1;
+c1	c2
+NULL	1000
+NULL	2
+NULL	3
...
[21 May 2010 18:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108922

3393 Luis Soares	2010-05-21
      BUG#53893: RBR: nullable unique key can lead to out-of-sync slave
      
      When using Unique Keys with nullable parts in RBR, the slave can
      choose the wrong row to update. This happens because a table with
      an unique key containing nullable parts is allowed to have
      multiple rows with NULL value for that same unique key fields.
      
      We fix this at the slave by extending the checks before assuming
      that the row found through an unique index is is the correct
      one. This means that when a record is fetched from the storage
      engine and a key that is not primary (say, KEY) is used, we check
      before doing an index scan: 
      
       - key is unique and has no nullable parts 
       - if it has, we check if any field in before image is null
       - if there is one field that is null proceed to index scan 
       - if not return row.
      
      A side change: renamed the existing test case file and added a
      test case covering the changes in this patch.
[2 Jun 2010 22:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110032

3393 Luis Soares	2010-06-02
      BUG#53893: RBR: nullable unique key can lead to out-of-sync slave
      
      When using Unique Keys with nullable parts in RBR, the slave can
      choose the wrong row to update. This happens because a table with
      an unique key containing nullable parts cannot strictly guarantee 
      uniqueness. As stated in the manual, for all engines, a UNIQUE 
      index allows multiple NULL values for columns that can contain 
      NULL.
      
      We fix this at the slave by extending the checks before assuming
      that the row found through an unique index is is the correct
      one. This means that when a record (R) is fetched from the storage
      engine and a key that is not primary (K) is used, the server does 
      the following: 
      
       - If K is unique and has no nullable parts, it returns R;
       - Otherwise, if any field in the before image that is part of K
         is null do an index scan;
       - If there is no NULL field in the BI part of K, then return R.
      
      A side change: renamed the existing test case file and added a
      test case covering the changes in this patch.
[2 Jun 2010 23:24] Luis Soares
Queued in mysql-5.1-bugteam and mysql-trunk-merge:
- http://pb2.norway.sun.com/web.py?template=push_details&push=1308836
- http://pb2.norway.sun.com/web.py?template=push_details&push=1308845
[3 Jun 2010 23:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110179

3412 Luis Soares	2010-06-04
      BUG#53893: RBR: nullable unique key can lead to out-of-sync slave
        
      Post-push fix.
        
      There was a valgrind issue on the loop that checks whether there
      are NULL fields in the UNIQUE KEY or not. In detail, for the last 
      iteration the server may read out of the key_part array boundaries,
      making valgrind to output warnings.
      
      We fix this by correcting the loop, ie, moving the part that reads
      from the key_part to be inside the loop statement block. This way
      the assignment is protected by the loop condition.
[3 Jun 2010 23:59] Luis Soares
Queued post-push fix in mysql-5.1-bugteam and mysql-trunk-merge:
- http://pb2.norway.sun.com/web.py?template=push_details&push=1312272
- http://pb2.norway.sun.com/web.py?template=push_details&push=1312269
[17 Jun 2010 6:12] Bugs System
Pushed into 5.5.5-m3 (revid:alexey.kopytov@sun.com-20100615145247-8bj0vmuqlotbqsn9) (version source revid:luis.soares@sun.com-20100603234730-dlinjjhtu4cqq26a) (merge vers: 5.5.5-m3) (pib:16)
[17 Jun 2010 6:16] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100615150216-cubqoyn1fj9b6a2p) (version source revid:alik@sun.com-20100607075151-xjus8wzgjv3g9ok0) (pib:16)
[13 Jul 2010 10:36] Jon Stephens
Documented in the 5.5.5 changelog as follows:

        When using unique keys on NULL columns in row-based replication,
        the slave sometimes chose the wrong row when performing an
        update. This happened because a table having a unique key on
        such a column could have multiple rows containing NULL for the
        column used by the unique key, and the slave merely picked the
        first row containing NULL in that column.

Set NM status, waiting for merge to 5.1.
[19 Jul 2010 14:34] Bugs System
Pushed into 5.1.49 (revid:build@mysql.com-20100719143034-omcma40sblwmay3x) (version source revid:luis.soares@sun.com-20100603234909-brpbbv22ypfhohbf) (merge vers: 5.1.48) (pib:16)
[21 Jul 2010 14:23] Jon Stephens
Also documented fix in the 5.1.49 changelog. Closed.
[14 Oct 2010 8:28] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:43] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:58] Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 12:32] Jon Stephens
No new changelog entry required. 

Setting back to Closed state.