Bug #26835 Repeatable corruption of utf8-enabled tables inside InnoDB
Submitted: 5 Mar 2007 8:52 Modified: 6 Apr 2007 19:48
Reporter: Domas Mituzas
Status: Closed
Category:Server: InnoDB Severity:S1 (Critical)
Version:5.0-bk, 5.1-bk OS:Linux (Linux, MacOSX, ..)
Assigned to: Marko Mäkelä Target Version:
Tags: innodb, corruption

[5 Mar 2007 8:52] Domas Mituzas
Description:
Immediately rewritten UTF8 data in InnoDB corrupts tables, possibly due to adaptive hash
index problems.

How to repeat:
Run attached testcase.

Suggested fix:
n/a
[5 Mar 2007 8:58] Domas Mituzas
To look for:

mismatching number of records in indexes, error-log entry:
Error: index `phrases$i_id` of table `test`.`phrases` contains 14 entries, should be 15

data in some fields ends up corrupted.
[5 Mar 2007 9:28] Domas Mituzas
Reduced repeatable testcase

Attachment: test2.sql (application/octet-stream, text), 428 bytes.

[5 Mar 2007 9:32] Domas Mituzas
The bug seems to be caused by different (missing?) normalization routines for adaptive
hash index.
[5 Mar 2007 14:16] Marko Mäkelä
The function row_upd_changes_field_size_or_external() should note that UTF-8 CHAR columns
in ROW_FORMAT=COMPACT are actually variable-length, and test the actual storage size of
the new value.
[5 Mar 2007 14:57] Marko Mäkelä
The bug is in rec_offs_nth_size(), which returns invalid data for n==0. The function is
only called by row_upd_changes_field_type_or_external(). Thus, the bug only manifests
itself when attempting to update the first column of a record (in a secondary index or in
the clustered index).

Fix: Add

	if (!n) {
		return(rec_offs_base(offsets)[1 + n] & REC_OFFS_MASK);
	}

before the return statement in rec_offs_nth_size().
[5 Mar 2007 15:19] Marko Mäkelä
The bug should only manifest itself under the following conditions:

(1) the clustered index (primary key or unique key) of the table begins with a char or
varchar column.

(2) a record is deleted

(3) before the record is purged, an equivalent record (under the charset-collation of the
column) is inserted so that the size changes

In test2.sql, the three-byte char U+FF24 (0xefbca4, D) is equivalent to the one-byte
char D.
[5 Mar 2007 20:10] James Day
Marko, just to be certain, please confirm that this can only affect the compact row
format, so it can only happen in versions 5.0 and later.
[5 Mar 2007 21:55] Marko Mäkelä
James,

this bug could affect ROW_FORMAT=REDUNDANT too. The error is that
rec_offs_nth_size(offsets, 0) would subtract rec_offs_extra_size(offsets) from the
result.

In test2.sql, extra_size==6 and the length of the first column (DDD) is 9 bytes.
Because 9-6=3, InnoDB would allow an in-place update with the 3-byte value DDD.

I believe that the test case can be modified for ROW_FORMAT=REDUNDANT by adding
variable-length columns so that extra_size will be a multiple of 3.

Note that in UPDATE, InnoDB uses a binary equality test to enable update-in-place.  This
bug should only affect INSERT by updating a delete-marked record in the clustered index,
and only if the first column of the clustered index uses a collation where byte sequences
of differing length can be considered equal.
[5 Mar 2007 21:58] Marko Mäkelä
Oh yes, I can confirm that this bug only exists in MySQL 5.0.3 and later.  The function
rec_get_offsets() and the functions beginning with rec_offs_ were introduced in MySQL
5.0.3.
[6 Mar 2007 8:25] Marko Mäkelä
Here is a test case for ROW_FORMAT=REDUNDANT:

DROP TABLE IF EXISTS t1;
CREATE TABLE `t1` (
  `word` varchar(50) collate utf8_unicode_ci NOT NULL PRIMARY KEY
) ENGINE=InnoDB ROW_FORMAT=REDUNDANT;

INSERT INTO t1 VALUES (0xEFBCA4EFBCA4EFBCA4EFBCA4EFBCA441);
DELETE FROM t1;
INSERT INTO t1 VALUES  (0x4444444444C384);
SELECT * FROM t1;

That is, we insert and delete the string DDDDDA and then insert the string
DDDDDÄ.
[6 Apr 2007 19:21] Bugs System
Pushed into 5.0.40
[6 Apr 2007 19:23] Bugs System
Pushed into 5.1.18-beta
[6 Apr 2007 19:48] Paul DuBois
Noted in 5.0.40, 5.1.18 changelog.

For InnoDB tables having a clustered index that began with a CHAR or
VARCHAR column, deleting a record and then inserting another before
the deleted record was purged could result in table corruption.