Bug #39648 Replication failure on RBR + Innodb + 2 bit fields + LIMIT + no PK
Submitted: 25 Sep 2008 12:34 Modified: 12 Mar 23:17
Reporter: Philip Stoev
Status: Closed
Category:Server: InnoDB Severity:S2 (Serious)
Version:5.1, 6.0 bzr OS:Any
Assigned to: Marko Mäkelä Target Version:5.1+
Triage: Triaged: D2 (Serious) / R2 (Low) / E3 (Medium)

[25 Sep 2008 12:34] Philip Stoev
Description:
When replicating a innodb table containing two bit columns but no PK, using RBR, a query
containing LIMIT will cause the replication to fail with:

Could not execute Delete_rows event on table test.table0_innodb; handler error
HA_ERR_END_OF_FILE

How to repeat:
--source include/master-slave.inc
--incude include/have_innodb.inc

CREATE TABLE t1 (`bit` bit, `bit2` bit) ENGINE=innodb;
INSERT INTO t1 ( `bit` ) VALUES ( 0 );
DELETE FROM t1 WHERE `bit` < 2 LIMIT 1;

--save_master_pos
--connection slave
--sync_with_master

SHOW SLAVE STATUS;
[25 Sep 2008 18:41] Sveta Smirnova
Thank you for the report.

I can not repeat error with 5.1 and with 6.0 I get:

Last_Error      Could not execute Delete_rows event on table test.t1; Unknown table
engine 'innodb', Error_code: 1286; Using storage engine MyISAM for table 't1',
Error_code: 1266; handler error HA_ERR_END_OF_FILE; the event's master log
master-bin.000001, end_log_pos 540

Is it same as in your case? Could you please confirm or reject if problem is repeatable
with last 5.1. sources.
[25 Sep 2008 18:59] Philip Stoev
From this message

"Unknown table engine
'innodb', Error_code: 1286; Using storage engine MyISAM for table 't1',"

it appears that your servers do not start with innodb enabled. So I can not tell if it is
the same bug or not. Please make sure you have innodb enabled and try again.
[25 Sep 2008 20:26] Sveta Smirnova
Thank you for the feedback.

Forgot to start slave with option --innodb. Verified as described with error:

Last_SQL_Errno  0
Last_SQL_Error  Could not execute Delete_rows event on table test.t1; handler error
HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 540
[23 Oct 2008 11:39] Lars Thalmann
On Thu, Oct 23, 2008 at 11:32:24AM +0200, Mats Kindahl wrote:
> Here is a detailed description of the problem and the solution.
>
> The cause of BUG#39648 is the way that InnoDB handles BIT fields that are NULL.
> In this particular case, we have the following scenario:
>
> - We have a table t1(a BIT, b BIT) holding the tuple (0, NULL).
>
> - We get a delete row from the master representing the tuple (0, NULL)
>
> The default record for the table is then [FF 00 00], i.e., the NULL bits are set
> for the record and the contents of the fields are 0.
>
> Delete proceeds by locating the row to delete using one of four different
> techniques, three of these require primary key or index, and the last fall-back
> is using range search. In this case, a range search will be used.
>
> To perform a range search, we call rnd_init() to start the range search over the
> entire table (the same applies when doing an index search using index_next() and
> friends), and then use rnd_next() to fetch the records one by one.
>
> Row-based replication does the following steps for each row:
>
> 1. Copy the default record into record[0] (which is used for searching).
>
>    a) This is done so that engines do not have to update fields that are
>       not in the read set or which are NULL and the memcmp() described below
>       will still result in a match.
>
> 2. Fill in record[0] with the fields received from the master (which is not
>    necessarily a full row). In this particular case, it will be [FD 00 *00*],
>    where the marked byte is the value from the default record while the other 0
>    is the "0" in the tuple. Note that FD == 11111101, i.e., the NULL bit for the
>    first column is clear while the NULL bit for the second column is set (the
>    least significant bit is the X bit).
>
> 3. Copy the resulting record to record[1]
>
> 4. Call rnd_next() to get the next record into record[0]. Here it is expected
>    that the storage engine do not touch any fields in the record unless the read
>    bit is set and the field is non-NULL.
>
>    a) InnoDB pads all the NULL BIT fields with 0x20 (space) *and* sets the NULL
>       bit, with the resulting record [FD 00 20]
>
> 5. The replication code does a memory compare of the record to decide if there
>    is a match.
>
>    a) Since InnoDB wrote 0x20 into the NULL field, they don't match.
>
> The alternative is to do a field-by-field comparison, which will catch the fact
> that the field is NULL and ignore the fact that the value was updated by InnoDB.
> However, that will be horribly slow compared to a simple memcmp(), so in that
> case, I will just add that workaround for the InnoDB engine.
[24 Oct 2008 9:18] Mats Kindahl
Here is a detailed description of the problem and the solution.

The cause of BUG#39648 is the way that InnoDB handles BIT fields that are NULL.
In this particular case, we have the following scenario:

- Table t1 holds the tuple (0, NULL).

- A delete row arrives from the master representing the tuple (0, NULL), i.e.,
  the tuple in the table shall be deleted.

The default record for the table is then [FF 00 00], i.e., the NULL bits are set
for the record and the contents of the fields are 0. (The first byte holds the NULL bits,
the filler bits, and the X bit.) Note that since the NULL bits are set, the values in the
record is not looked at when computing the value of the field for presenting to, e.g., a
client.

Delete proceeds by locating the row to delete using one of four different
techniques, three of these require primary key or index, and the last fall-back
is using range search. In this case, a range search will be used.

To perform a range search, we call rnd_init() to start the range search over the
entire table and then use rnd_next() to fetch the records one by one. The same applies
when doing an index search using index_next() and friends, but we do not consider this
case here.

Row-based replication does the following steps for each row:

1. Copy the default record into record[0] (which is used for searching).

   a) This is done so that engines do not have to update fields that are
      not in the read set or which are NULL and the memcmp() described below
      will still result in a match.

   b) It also allows engines that don't know the contents of the row to "fake"
      finding the row and get the correct contents out. This is used by the
      Blackhole engine.

2. Fill in record[0] with the fields received from the master (which is not
   necessarily a full row). In this particular case, it will be [FD 00 *00*],
   where the marked byte is the value from the default record while the other
   00 is the "0" in the tuple to delete. Note that FD == 11111101, i.e., the
   NULL bit for the first column is clear while the NULL bit for the second
   column is set (the least significant bit is the X bit).

3. Copy the resulting record to record[1]

4. Call rnd_next() to get the next record *in the table* into record[0]. Here
   it is expected that the storage engine do not touch any fields in the record
   unless the read bit is set and the field is non-NULL. In the event of a NULL
   field with the read bit set, the engine shall *only* touch the NULL bit (to
   set it).

   a) Unfortunately, since InnoDB treats BIT as equivalent to CHAR(1), the NULL
      BIT field is padded with 0x20 (space) *and* the NULL bit is set, with the
      resulting record [FD 00 20].

5. The replication code does a memory compare of the record to decide if there
   is a match.

   a) Since InnoDB wrote 0x20 into the NULL field, they don't match, even
      though it is the correct record.
[10 Dec 2008 13:19] Marko Mäkelä
Does the row-based replication always call
handler::extra(HA_EXTRA_KEYREAD_PRESERVE_FIELDS)?

If it does, I may have a patch, which must pass our internal review first.

If not, then I must point out that CHECKSUM TABLE is apparently still committing the "old
sin" of not initializing records[0] between retrieving rows. I found this out by making
row_sel_store_mysql_rec() and row_sel_pop_cached_row_for_mysql() avoid writing any data
for NULL columns. innodb.test would report mismatch for CHECKSUM TABLE.
[10 Dec 2008 14:54] Mats Kindahl
I don't think that flags is currently set, but that can be arranged. It seems no other
engine is using it anyway, so it should not cause any problems.
[12 Dec 2008 15:16] Marko Mäkelä
Given that there are code paths that do not initialize all fields of records[0] before
fetching a row from the storage engine (CHECKSUM TABLE being an example), I think that it
would be safer to pad BIT columns with 0x00 instead of 0x20. I will try to come up with
such a patch next week.
[17 Dec 2008 9:07] Mats Kindahl
Marko,

Please consider using the value from the default record instead. In the event that the
default value for the field is something that is not 0, there will be a mismatch again
for tables like:

    create table t1(a bit default b'1');
[4 Mar 8:01] Zhenxing He
BUG#40638 was marked as Duplicate of this bug.
[12 Mar 23:17] Paul DuBois
Noted in 5.1.32, 6.0.11 changelogs.

With row-based binary logging, replication of InnoDB tables
containing NULL-valued BIT columns could fail.
[3 Jun 10:58] Marko Mäkelä
Bug #40565 (different symptoms) shares the same cause.
[7 Jul 9:52] Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source
revid:satya.bn@sun.com-20090625095026-w8pdxn8ls04g7tcp) (merge vers: 5.0.84) (pib:11)
[8 Jul 15:30] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source
revid:staale.smedseng@sun.com-20090626084621-g6zteyvzehto95j0) (merge vers: 5.1.37)
(pib:11)
[9 Jul 9:35] Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source
revid:satya.bn@sun.com-20090625095026-w8pdxn8ls04g7tcp) (merge vers: 5.0.84) (pib:11)
[9 Jul 9:36] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source
revid:staale.smedseng@sun.com-20090626084621-g6zteyvzehto95j0) (merge vers: 5.1.37)
(pib:11)
[10 Jul 13:21] Bugs System
Pushed into 5.4.4-alpha
(revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source
revid:sergey.glukhov@sun.com-20090625092510-lq3mm6ytxb2ldfim) (merge vers: 5.4.4-alpha)
(pib:11)
[26 Aug 15:46] Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l)
(version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers:
5.1.37-ndb-7.0.8) (pib:11)
[26 Aug 15:46] Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc)
(version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers:
5.1.37-ndb-6.3.27) (pib:11)
[26 Aug 15:48] Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4)
(version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers:
5.1.37-ndb-6.2.19) (pib:11)
[27 Aug 18:33] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr)
(version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers:
5.1.37-ndb-7.0.8) (pib:11)