| Bug #52868 | Wrong handling of NULL value during update, replication out of sync | ||
|---|---|---|---|
| Submitted: | 15 Apr 2010 21:03 | Modified: | 14 Oct 2010 12:28 |
| Reporter: | Matthias Leich | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.1.37, 5.1.47,5.5.5-m3,5.6.99-m4 | OS: | Any |
| Assigned to: | Luis Soares | CPU Architecture: | Any |
| Tags: | null | ||
[16 Apr 2010 12:19]
Matthias Leich
The following trees showed the same wrong behaviour. 5.5.5-m3 (mysql-trunk revno: 3008 2010-04-12) 5.6.99-m4 (mysql-next-mr revno: 3139 2010-04-12)
[21 Apr 2010 12:43]
Luis Soares
Analysis
~~~~~~~~
INTRO
=====
To update records, roughly the slave (1) unpacks the before image
(BI) from the binary log event current row, (2) iterates over the
storage engine rows (SER) and compares both. If BI == SER, then the
row is used to replay the after image (AI) on top of it. Finally,
the updated row is stored back in the SE.
This is done at Update_rows_log_event::do_exec_row, which
relies in Rows_log_event::find_row to search the correct SER.
BI and SER are matched against each other in record_compare,
which (in some cases) compares null_bits first and then the
actual record contents. A special cleanup for unused null_bits
is done before the actual comparison so that don't care bits
get the same value for both BI and SER.
PROBLEM DESCRIPTION
===================
** P1. Problem depicted by the bug reporter **
The null_bits of a record are set according to the X bit, nullable
fields and Field_bit data. So the problem stems from the fact that
when:
- X bit (if any) + N nullable columns + M Field_bit fields = 8
means that table->s->last_null_bit_pos == 0. At this point, in
log_event.cc:record_compare, the part that clears unused null_bits,
assumes that all bits in the null_byte should be cleared, thence
wrongly overwriting null_bits.
The bug surfaces at this point, because when fields contents can
perfectly match a BI except for a difference in a null bit, and
given that all null bits are (wrongly) cleared before comparing the
actual record contents, the wrong pair of records can be considered
to match.
For the specific case reported, the assignment of f1='a' will
ultimately result in (for the first row updated):
- null_bit(f1)=0 && f1=0.
When replaying the second row update, the BI will match with
the first record updated, because:
1. null_bit(f1) = 1 (set wrongly at the beginning of
record_compare for SER and BI)
2. f1 field content for BI (NULL) will match f1 field content
for SER (0) - see (in record_compare):
if ((*ptr)->cmp_binary_offset(table->s->rec_buff_length))
This will cause the first row to be updated twice and thence
the difference in results between master and slave.
** P2. Problem found during analysis around X bit setting. **
There is still another issue. The null_bytes are computed
according to the presence of HA_OPTION_PACK_RECORD. If it is
set, then X bit is not included. So we should also check this
before clearing the X bit. The following test case proves
that this is indeed an issue:
-- source include/master-slave.inc
-- source include/have_binlog_format_row.inc
CREATE TABLE t1 (c1 int, c2 varchar(1) default '');
INSERT INTO t1(c1) VALUES (10);
INSERT INTO t1(c1) VALUES (NULL);
UPDATE t1 SET c1= 0;
SELECT * FROM t1;
-- sync_slave_with_master
SELECT * FROM t1;
-- connection master
DROP TABLE t1;
-- sync_slave_with_master
-- exit
Results show:
(...)
INSERT INTO t1(c1) VALUES (NULL);
UPDATE t1 SET c1= 0;
SELECT * FROM t1;
c1 c2
0
0
SELECT * FROM t1;
c1 c2
0
NULL
DROP TABLE t1;
FIX
===
To fix the two issues we should do the following in record_compare:
P1. We only clean the unused bits, if:
- table->s->null_bytes > 0 &&
- table->s->last_null_bit_pos > 0
P2. We only set the X bit if:
- !(table->s->db_options_in_use & HA_OPTION_PACK_RECORD)
[21 Apr 2010 12:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106253 3356 Luis Soares 2010-04-21 BUG#52868: Wrong handling of NULL value during update, replication out of sync In RBR, sometimes the table->s->last_null_bit_pos can be zero. This has impact at the slave when it compares records fetched from the storage engine against records in the binary log event. If last_null_bit_pos is zero the slave, while comparing in log_event.cc:record_compare function, would set all bits in the last null_byte to 1 (assumed all 8 were unused) . Thence it would loose the ability to distinguish records that were similar in contents except for the fact that some field was null in one record, but not in the other. Ultimately this would cause wrong matches, and in the specific case depicted in the bug report the same record would be updated twice, resulting in a lost update. Additionally, in the record_compare function the slave was setting the X bit unconditionally. There are cases that the X bit does not exist in the record header. This could also lead to wrong matches between records. We fix both by conditionally resetting the bits: (i) unused null_bits are set if last_null_bit_pos > 0; (ii) X bit is set if HA_OPTION_PACK_RECORD is in use. @ mysql-test/extra/rpl_tests/rpl_record_compare.test Shared part of the test case for MyISAM and InnoDB. @ mysql-test/suite/rpl/t/rpl_row_rec_comp_innodb.test InnoDB test case. @ mysql-test/suite/rpl/t/rpl_row_rec_comp_myisam.test MyISAM test case. Added also coverage for Field_bits case. @ sql/log_event.cc Deployed conditional setting of unused bits at record_compare. @ sql/log_event_old.cc Same change as in log_event.cc.
[20 May 2010 1:23]
Luis Soares
Queued in mysql-5.1-bugteam and mysql-pe: - http://pb2.norway.sun.com/web.py?template=push_details&push=1270678 - http://pb2.norway.sun.com/web.py?template=push_details&push=1270677
[28 May 2010 5:46]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (pib:16)
[28 May 2010 6:16]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100524190409-5w4l7mje1wk1c90l) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:43]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100523204118-0tl3goawu658rxh6) (merge vers: 5.5.5-m3) (pib:16)
[2 Jun 2010 8:49]
Bugs System
Pushed into 5.1.48 (revid:georgi.kodinov@oracle.com-20100602084411-2yu607bslbmgufl3) (version source revid:luis.soares@sun.com-20100519235042-1qdf81hc5az90r3o) (merge vers: 5.1.47) (pib:16)
[3 Jun 2010 11:40]
Jon Stephens
Documented bugfix in the 5.1.48, 5.5.5, and 6.0.14 changelogs as follows:
In some cases, attempting to update a column with a value of an
incompatible type resulted in a mismatch between master and
slave because the column value was set to its implicit default
value on the master (as expected), but the same column on the
slave was set to NULL.
Closed.
[14 Oct 2010 8:26]
Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:41]
Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:56]
Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 12:28]
Jon Stephens
No new changelog entry required. Setting back to Closed state.

Description: My script: ---------- --source include/have_binlog_format_row.inc --source include/master-slave.inc --disable_warnings DROP TABLE IF EXISTS t1; --enable_warnings CREATE TABLE t1 ( f10 bigint(20) DEFAULT 0, f11 bigint(20) DEFAULT 0, f12 bigint(20) DEFAULT 0, f13 varchar(1) DEFAULT '', f1 bigint(20) DEFAULT 0, f4 bigint(20) DEFAULT 0, f2 bigint(20) DEFAULT 0, f3 bigint(20) DEFAULT 0 ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO t1 ( f1, f2 ) VALUES ( 1 , 35 ); INSERT INTO t1 ( f1, f2 ) VALUES ( NULL, 35 ); UPDATE t1 SET f1 = 'a'; SELECT 'On Master',f1,f2 FROM t1; --sync_slave_with_master --connection slave SELECT 'On Slave',f1,f2 FROM t1; --connection master DROP TABLE IF EXISTS t1; Result on 5.1.47 (mysql-5.1 revno: 3446 2010-04-12): ---------------------------------------------------- stop slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; reset master; reset slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; start slave; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( f10 bigint(20) DEFAULT 0, f11 bigint(20) DEFAULT 0, f12 bigint(20) DEFAULT 0, f13 varchar(1) DEFAULT '', f1 bigint(20) DEFAULT 0, f4 bigint(20) DEFAULT 0, f2 bigint(20) DEFAULT 0, f3 bigint(20) DEFAULT 0 ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO t1 ( f1, f2 ) VALUES ( 1 , 35 ); INSERT INTO t1 ( f1, f2 ) VALUES ( NULL, 35 ); UPDATE t1 SET f1 = 'a'; Warnings: Warning 1366 Incorrect integer value: 'a' for column 'f1' at row 1 Warning 1366 Incorrect integer value: 'a' for column 'f1' at row 2 <== These warnings are ok. SELECT 'On Master',f1,f2 FROM t1; On Master f1 f2 On Master 0 35 On Master 0 35 <== This result is ok. SELECT 'On Slave',f1,f2 FROM t1; On Slave f1 f2 On Slave 0 35 On Slave NULL 35 <== Why is f1 NULL for the second row? The replication is out of sync. DROP TABLE IF EXISTS t1; I will add results for version 5.1+ soon. How to repeat: ./mtr --mysqld=--binlog-format=row rpl_ml1