MySQL Bugs: #39648: Replication failure on RBR + Innodb + 2 bit fields + LIMIT + no PK

Bug #39648	Replication failure on RBR + Innodb + 2 bit fields + LIMIT + no PK
Submitted:	25 Sep 2008 10:34	Modified:	19 Jun 2010 0:24
Reporter:	Philip Stoev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.1, 6.0 bzr	OS:	Any
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
When replicating a innodb table containing two bit columns but no PK, using RBR, a query containing LIMIT will cause the replication to fail with:

Could not execute Delete_rows event on table test.table0_innodb; handler error HA_ERR_END_OF_FILE

How to repeat:
--source include/master-slave.inc
--incude include/have_innodb.inc

CREATE TABLE t1 (`bit` bit, `bit2` bit) ENGINE=innodb;
INSERT INTO t1 ( `bit` ) VALUES ( 0 );
DELETE FROM t1 WHERE `bit` < 2 LIMIT 1;

--save_master_pos
--connection slave
--sync_with_master

SHOW SLAVE STATUS;

Thank you for the report.

I can not repeat error with 5.1 and with 6.0 I get:

Last_Error      Could not execute Delete_rows event on table test.t1; Unknown table engine 'innodb', Error_code: 1286; Using storage engine MyISAM for table 't1', Error_code: 1266; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 540

Is it same as in your case? Could you please confirm or reject if problem is repeatable with last 5.1. sources.

From this message

"Unknown table engine
'innodb', Error_code: 1286; Using storage engine MyISAM for table 't1',"

it appears that your servers do not start with innodb enabled. So I can not tell if it is the same bug or not. Please make sure you have innodb enabled and try again.

Thank you for the feedback.

Forgot to start slave with option --innodb. Verified as described with error:

Last_SQL_Errno  0
Last_SQL_Error  Could not execute Delete_rows event on table test.t1; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 540

On Thu, Oct 23, 2008 at 11:32:24AM +0200, Mats Kindahl wrote:
> Here is a detailed description of the problem and the solution.
>
> The cause of BUG#39648 is the way that InnoDB handles BIT fields that are NULL.
> In this particular case, we have the following scenario:
>
> - We have a table t1(a BIT, b BIT) holding the tuple (0, NULL).
>
> - We get a delete row from the master representing the tuple (0, NULL)
>
> The default record for the table is then [FF 00 00], i.e., the NULL bits are set
> for the record and the contents of the fields are 0.
>
> Delete proceeds by locating the row to delete using one of four different
> techniques, three of these require primary key or index, and the last fall-back
> is using range search. In this case, a range search will be used.
>
> To perform a range search, we call rnd_init() to start the range search over the
> entire table (the same applies when doing an index search using index_next() and
> friends), and then use rnd_next() to fetch the records one by one.
>
> Row-based replication does the following steps for each row:
>
> 1. Copy the default record into record[0] (which is used for searching).
>
>    a) This is done so that engines do not have to update fields that are
>       not in the read set or which are NULL and the memcmp() described below
>       will still result in a match.
>
> 2. Fill in record[0] with the fields received from the master (which is not
>    necessarily a full row). In this particular case, it will be [FD 00 *00*],
>    where the marked byte is the value from the default record while the other 0
>    is the "0" in the tuple. Note that FD == 11111101, i.e., the NULL bit for the
>    first column is clear while the NULL bit for the second column is set (the
>    least significant bit is the X bit).
>
> 3. Copy the resulting record to record[1]
>
> 4. Call rnd_next() to get the next record into record[0]. Here it is expected
>    that the storage engine do not touch any fields in the record unless the read
>    bit is set and the field is non-NULL.
>
>    a) InnoDB pads all the NULL BIT fields with 0x20 (space) *and* sets the NULL
>       bit, with the resulting record [FD 00 20]
>
> 5. The replication code does a memory compare of the record to decide if there
>    is a match.
>
>    a) Since InnoDB wrote 0x20 into the NULL field, they don't match.
>
> The alternative is to do a field-by-field comparison, which will catch the fact
> that the field is NULL and ignore the fact that the value was updated by InnoDB.
> However, that will be horribly slow compared to a simple memcmp(), so in that
> case, I will just add that workaround for the InnoDB engine.

Here is a detailed description of the problem and the solution.

The cause of BUG#39648 is the way that InnoDB handles BIT fields that are NULL.
In this particular case, we have the following scenario:

- Table t1 holds the tuple (0, NULL).

- A delete row arrives from the master representing the tuple (0, NULL), i.e.,
  the tuple in the table shall be deleted.

The default record for the table is then [FF 00 00], i.e., the NULL bits are set
for the record and the contents of the fields are 0. (The first byte holds the NULL bits, the filler bits, and the X bit.) Note that since the NULL bits are set, the values in the record is not looked at when computing the value of the field for presenting to, e.g., a client.

Delete proceeds by locating the row to delete using one of four different
techniques, three of these require primary key or index, and the last fall-back
is using range search. In this case, a range search will be used.

To perform a range search, we call rnd_init() to start the range search over the
entire table and then use rnd_next() to fetch the records one by one. The same applies when doing an index search using index_next() and friends, but we do not consider this case here.

Row-based replication does the following steps for each row:

1. Copy the default record into record[0] (which is used for searching).

   a) This is done so that engines do not have to update fields that are
      not in the read set or which are NULL and the memcmp() described below
      will still result in a match.

   b) It also allows engines that don't know the contents of the row to "fake"
      finding the row and get the correct contents out. This is used by the
      Blackhole engine.

2. Fill in record[0] with the fields received from the master (which is not
   necessarily a full row). In this particular case, it will be [FD 00 *00*],
   where the marked byte is the value from the default record while the other
   00 is the "0" in the tuple to delete. Note that FD == 11111101, i.e., the
   NULL bit for the first column is clear while the NULL bit for the second
   column is set (the least significant bit is the X bit).

3. Copy the resulting record to record[1]

4. Call rnd_next() to get the next record *in the table* into record[0]. Here
   it is expected that the storage engine do not touch any fields in the record
   unless the read bit is set and the field is non-NULL. In the event of a NULL
   field with the read bit set, the engine shall *only* touch the NULL bit (to
   set it).

   a) Unfortunately, since InnoDB treats BIT as equivalent to CHAR(1), the NULL
      BIT field is padded with 0x20 (space) *and* the NULL bit is set, with the
      resulting record [FD 00 20].

5. The replication code does a memory compare of the record to decide if there
   is a match.

   a) Since InnoDB wrote 0x20 into the NULL field, they don't match, even
      though it is the correct record.

Does the row-based replication always call handler::extra(HA_EXTRA_KEYREAD_PRESERVE_FIELDS)?

If it does, I may have a patch, which must pass our internal review first.

If not, then I must point out that CHECKSUM TABLE is apparently still committing the "old sin" of not initializing records[0] between retrieving rows. I found this out by making row_sel_store_mysql_rec() and row_sel_pop_cached_row_for_mysql() avoid writing any data for NULL columns. innodb.test would report mismatch for CHECKSUM TABLE.

I don't think that flags is currently set, but that can be arranged. It seems no other engine is using it anyway, so it should not cause any problems.

Given that there are code paths that do not initialize all fields of records[0] before fetching a row from the storage engine (CHECKSUM TABLE being an example), I think that it would be safer to pad BIT columns with 0x00 instead of 0x20. I will try to come up with such a patch next week.

Marko,

Please consider using the value from the default record instead. In the event that the default value for the field is something that is not 0, there will be a mismatch again for tables like:

    create table t1(a bit default b'1');

BUG#40638 was marked as Duplicate of this bug.

Noted in 5.1.32, 6.0.11 changelogs.

With row-based binary logging, replication of InnoDB tables
containing NULL-valued BIT columns could fail.

Bug #40565 (different symptoms) shares the same cause.

Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:satya.bn@sun.com-20090625095026-w8pdxn8ls04g7tcp) (merge vers: 5.0.84) (pib:11)

Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:staale.smedseng@sun.com-20090626084621-g6zteyvzehto95j0) (merge vers: 5.1.37) (pib:11)

Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:satya.bn@sun.com-20090625095026-w8pdxn8ls04g7tcp) (merge vers: 5.0.84) (pib:11)

Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:staale.smedseng@sun.com-20090626084621-g6zteyvzehto95j0) (merge vers: 5.1.37) (pib:11)

Pushed into 5.4.4-alpha (revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source revid:sergey.glukhov@sun.com-20090625092510-lq3mm6ytxb2ldfim) (merge vers: 5.4.4-alpha) (pib:11)

Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)

Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)

Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)

Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Bug #46597 was marked as a duplicate of bug #40638, the latter which was previously marked as a duplicate of this bug.

See also bug #44360

See also bug #47322