| Bug #26835 | Repeatable corruption of utf8-enabled tables inside InnoDB | ||
|---|---|---|---|
| Submitted: | 5 Mar 2007 8:52 | Modified: | 6 Apr 2007 19:48 |
| Reporter: | Domas Mituzas | ||
| Status: | Closed | ||
| Category: | Server: InnoDB | Severity: | S1 (Critical) |
| Version: | 5.0-bk, 5.1-bk | OS: | Linux (Linux, MacOSX, ..) |
| Assigned to: | Marko Mäkelä | Target Version: | |
| Tags: | innodb, corruption | ||
[5 Mar 2007 8:52]
Domas Mituzas
[5 Mar 2007 8:58]
Domas Mituzas
To look for: mismatching number of records in indexes, error-log entry: Error: index `phrases$i_id` of table `test`.`phrases` contains 14 entries, should be 15 data in some fields ends up corrupted.
[5 Mar 2007 9:28]
Domas Mituzas
Reduced repeatable testcase
Attachment: test2.sql (application/octet-stream, text), 428 bytes.
[5 Mar 2007 9:32]
Domas Mituzas
The bug seems to be caused by different (missing?) normalization routines for adaptive hash index.
[5 Mar 2007 14:16]
Marko Mäkelä
The function row_upd_changes_field_size_or_external() should note that UTF-8 CHAR columns in ROW_FORMAT=COMPACT are actually variable-length, and test the actual storage size of the new value.
[5 Mar 2007 14:57]
Marko Mäkelä
The bug is in rec_offs_nth_size(), which returns invalid data for n==0. The function is
only called by row_upd_changes_field_type_or_external(). Thus, the bug only manifests
itself when attempting to update the first column of a record (in a secondary index or in
the clustered index).
Fix: Add
if (!n) {
return(rec_offs_base(offsets)[1 + n] & REC_OFFS_MASK);
}
before the return statement in rec_offs_nth_size().
[5 Mar 2007 15:19]
Marko Mäkelä
The bug should only manifest itself under the following conditions: (1) the clustered index (primary key or unique key) of the table begins with a char or varchar column. (2) a record is deleted (3) before the record is purged, an equivalent record (under the charset-collation of the column) is inserted so that the size changes In test2.sql, the three-byte char U+FF24 (0xefbca4, D) is equivalent to the one-byte char D.
[5 Mar 2007 20:10]
James Day
Marko, just to be certain, please confirm that this can only affect the compact row format, so it can only happen in versions 5.0 and later.
[5 Mar 2007 21:55]
Marko Mäkelä
James, this bug could affect ROW_FORMAT=REDUNDANT too. The error is that rec_offs_nth_size(offsets, 0) would subtract rec_offs_extra_size(offsets) from the result. In test2.sql, extra_size==6 and the length of the first column (DDD) is 9 bytes. Because 9-6=3, InnoDB would allow an in-place update with the 3-byte value DDD. I believe that the test case can be modified for ROW_FORMAT=REDUNDANT by adding variable-length columns so that extra_size will be a multiple of 3. Note that in UPDATE, InnoDB uses a binary equality test to enable update-in-place. This bug should only affect INSERT by updating a delete-marked record in the clustered index, and only if the first column of the clustered index uses a collation where byte sequences of differing length can be considered equal.
[5 Mar 2007 21:58]
Marko Mäkelä
Oh yes, I can confirm that this bug only exists in MySQL 5.0.3 and later. The function rec_get_offsets() and the functions beginning with rec_offs_ were introduced in MySQL 5.0.3.
[6 Mar 2007 8:25]
Marko Mäkelä
Here is a test case for ROW_FORMAT=REDUNDANT: DROP TABLE IF EXISTS t1; CREATE TABLE `t1` ( `word` varchar(50) collate utf8_unicode_ci NOT NULL PRIMARY KEY ) ENGINE=InnoDB ROW_FORMAT=REDUNDANT; INSERT INTO t1 VALUES (0xEFBCA4EFBCA4EFBCA4EFBCA4EFBCA441); DELETE FROM t1; INSERT INTO t1 VALUES (0x4444444444C384); SELECT * FROM t1; That is, we insert and delete the string DDDDDA and then insert the string DDDDDÄ.
[6 Apr 2007 19:21]
Bugs System
Pushed into 5.0.40
[6 Apr 2007 19:23]
Bugs System
Pushed into 5.1.18-beta
[6 Apr 2007 19:48]
Paul DuBois
Noted in 5.0.40, 5.1.18 changelog. For InnoDB tables having a clustered index that began with a CHAR or VARCHAR column, deleting a record and then inserting another before the deleted record was purged could result in table corruption.
