Bug #52868 | Wrong handling of NULL value during update, replication out of sync | ||
---|---|---|---|
Submitted: | 15 Apr 2010 21:03 | Modified: | 14 Oct 2010 12:28 |
Reporter: | Matthias Leich | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.37, 5.1.47,5.5.5-m3,5.6.99-m4 | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
Tags: | null |
[15 Apr 2010 21:03]
Matthias Leich
[16 Apr 2010 12:19]
Matthias Leich
The following trees showed the same wrong behaviour. 5.5.5-m3 (mysql-trunk revno: 3008 2010-04-12) 5.6.99-m4 (mysql-next-mr revno: 3139 2010-04-12)
[21 Apr 2010 12:43]
Luis Soares
Analysis ~~~~~~~~ INTRO ===== To update records, roughly the slave (1) unpacks the before image (BI) from the binary log event current row, (2) iterates over the storage engine rows (SER) and compares both. If BI == SER, then the row is used to replay the after image (AI) on top of it. Finally, the updated row is stored back in the SE. This is done at Update_rows_log_event::do_exec_row, which relies in Rows_log_event::find_row to search the correct SER. BI and SER are matched against each other in record_compare, which (in some cases) compares null_bits first and then the actual record contents. A special cleanup for unused null_bits is done before the actual comparison so that don't care bits get the same value for both BI and SER. PROBLEM DESCRIPTION =================== ** P1. Problem depicted by the bug reporter ** The null_bits of a record are set according to the X bit, nullable fields and Field_bit data. So the problem stems from the fact that when: - X bit (if any) + N nullable columns + M Field_bit fields = 8 means that table->s->last_null_bit_pos == 0. At this point, in log_event.cc:record_compare, the part that clears unused null_bits, assumes that all bits in the null_byte should be cleared, thence wrongly overwriting null_bits. The bug surfaces at this point, because when fields contents can perfectly match a BI except for a difference in a null bit, and given that all null bits are (wrongly) cleared before comparing the actual record contents, the wrong pair of records can be considered to match. For the specific case reported, the assignment of f1='a' will ultimately result in (for the first row updated): - null_bit(f1)=0 && f1=0. When replaying the second row update, the BI will match with the first record updated, because: 1. null_bit(f1) = 1 (set wrongly at the beginning of record_compare for SER and BI) 2. f1 field content for BI (NULL) will match f1 field content for SER (0) - see (in record_compare): if ((*ptr)->cmp_binary_offset(table->s->rec_buff_length)) This will cause the first row to be updated twice and thence the difference in results between master and slave. ** P2. Problem found during analysis around X bit setting. ** There is still another issue. The null_bytes are computed according to the presence of HA_OPTION_PACK_RECORD. If it is set, then X bit is not included. So we should also check this before clearing the X bit. The following test case proves that this is indeed an issue: -- source include/master-slave.inc -- source include/have_binlog_format_row.inc CREATE TABLE t1 (c1 int, c2 varchar(1) default ''); INSERT INTO t1(c1) VALUES (10); INSERT INTO t1(c1) VALUES (NULL); UPDATE t1 SET c1= 0; SELECT * FROM t1; -- sync_slave_with_master SELECT * FROM t1; -- connection master DROP TABLE t1; -- sync_slave_with_master -- exit Results show: (...) INSERT INTO t1(c1) VALUES (NULL); UPDATE t1 SET c1= 0; SELECT * FROM t1; c1 c2 0 0 SELECT * FROM t1; c1 c2 0 NULL DROP TABLE t1; FIX === To fix the two issues we should do the following in record_compare: P1. We only clean the unused bits, if: - table->s->null_bytes > 0 && - table->s->last_null_bit_pos > 0 P2. We only set the X bit if: - !(table->s->db_options_in_use & HA_OPTION_PACK_RECORD)
[21 Apr 2010 12:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106253 3356 Luis Soares 2010-04-21 BUG#52868: Wrong handling of NULL value during update, replication out of sync In RBR, sometimes the table->s->last_null_bit_pos can be zero. This has impact at the slave when it compares records fetched from the storage engine against records in the binary log event. If last_null_bit_pos is zero the slave, while comparing in log_event.cc:record_compare function, would set all bits in the last null_byte to 1 (assumed all 8 were unused) . Thence it would loose the ability to distinguish records that were similar in contents except for the fact that some field was null in one record, but not in the other. Ultimately this would cause wrong matches, and in the specific case depicted in the bug report the same record would be updated twice, resulting in a lost update. Additionally, in the record_compare function the slave was setting the X bit unconditionally. There are cases that the X bit does not exist in the record header. This could also lead to wrong matches between records. We fix both by conditionally resetting the bits: (i) unused null_bits are set if last_null_bit_pos > 0; (ii) X bit is set if HA_OPTION_PACK_RECORD is in use. @ mysql-test/extra/rpl_tests/rpl_record_compare.test Shared part of the test case for MyISAM and InnoDB. @ mysql-test/suite/rpl/t/rpl_row_rec_comp_innodb.test InnoDB test case. @ mysql-test/suite/rpl/t/rpl_row_rec_comp_myisam.test MyISAM test case. Added also coverage for Field_bits case. @ sql/log_event.cc Deployed conditional setting of unused bits at record_compare. @ sql/log_event_old.cc Same change as in log_event.cc.
[20 May 2010 1:23]
Luis Soares
Queued in mysql-5.1-bugteam and mysql-pe: - http://pb2.norway.sun.com/web.py?template=push_details&push=1270678 - http://pb2.norway.sun.com/web.py?template=push_details&push=1270677
[28 May 2010 5:46]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (pib:16)
[28 May 2010 6:16]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100524190409-5w4l7mje1wk1c90l) (merge vers: 6.0.14-alpha) (pib:16)
[28 May 2010 6:43]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100523204118-0tl3goawu658rxh6) (merge vers: 5.5.5-m3) (pib:16)
[2 Jun 2010 8:49]
Bugs System
Pushed into 5.1.48 (revid:georgi.kodinov@oracle.com-20100602084411-2yu607bslbmgufl3) (version source revid:luis.soares@sun.com-20100519235042-1qdf81hc5az90r3o) (merge vers: 5.1.47) (pib:16)
[3 Jun 2010 11:40]
Jon Stephens
Documented bugfix in the 5.1.48, 5.5.5, and 6.0.14 changelogs as follows: In some cases, attempting to update a column with a value of an incompatible type resulted in a mismatch between master and slave because the column value was set to its implicit default value on the master (as expected), but the same column on the slave was set to NULL. Closed.
[14 Oct 2010 8:26]
Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:41]
Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 8:56]
Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh) (merge vers: 5.5.5-m3) (pib:21)
[14 Oct 2010 12:28]
Jon Stephens
No new changelog entry required. Setting back to Closed state.