Bug #28138 indexing column prefixes produces corruption in InnoDB
Submitted: 27 Apr 2007 16:42 Modified: 18 Jun 2010 22:30
Reporter: Matthew Lord Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.0.30, 5.0.38, 5.0.42BK, 5.1.18BK OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: bfsm_2007_05_03, corruption, crashing, innodb, Prefix Indexing

[27 Apr 2007 16:42] Matthew Lord
Description:
In 5.0 InnoDB added support for indexing a prefix of a column, this leads to
corruption or at least reports of it.  It's been reported that this also 
causes mysqld to crash in certain types of usage.

How to repeat:
CREATE TABLE `prefix_test` ( `lt` varbinary(4000) NOT NULL, KEY `lt_index` (`lt`(255))) ENGINE=InnoDB;

insert into prefix_test (lt) values ("alsdf;lkasdpfusssssssssssssssssffffffffffffffffffffffffffffffffkkkkkkkkkkkkkkkkkkkkkk2222222222222222222222222222225555555555555555555555555555555555999999999999999999999999999999999999999999999912123123123;1j23;lkj123;l1k2j31;l2kj31;2lk3j12;3lkj12;3lkj12;3lkj123;lkjapdoiufapsoiudfpaoisdufapsodiufaspdoifuaspdoifuaspdoifuasdpofiu098370987340198134123pi;klajdf;lakjad;fkasdf/mnasd/fnas.dmfnasdf/nasdf/masdf;lkajvzx;ckjvpiuyqweopuryqepoiruqwpeiruqwpeiruqwerha;dkfjasdfn/mzncv/mn/m,n/mn/:Cvjhoiuqepriuqpweiurqwepioruqwperiu07a;jasdkfasdnbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb3333333333333jjjjjjjjkkkkkkkkaaaaaaaa;kljad;fkljasd;fkjasd;lkfjasd;lkfjas;dlkfjas;dlkfjpiuqerpoiquwepriuqweriuqyweruytqweurytqwiuerygqwehrbqwlebrqc lwebn kcvzxcmvz/xcmvzx/cvzx/cmvzxc     kljasd;kfjpoiu'qkjq'wekrjq'kerqwe[prqweiuyaua9840173407861234a;kjad;fjasdf asdfasl;dkfjas dflkjasdf as;dkfj as;dfjasdtyfaisudyf asdfiy asdif yasidufy asdif yasduf asdf;asd fasdf adf a asdfalsdfj asd fasdkfjpqieuyrqpiuweyr'adfj;adsfasdklfj asdfkjasdfa asdfasdf asdfj asdfjasldkfja sdf qqerjqwker qwer jqwer;kqwjerpqweur03847501234786510345b asdf");

CHECK TABLE prefix_test;

You can also check the error log and you will see the corruption reported there.

Suggested fix:
Fix the corruption or the false reports of it.
[2 May 2007 10:08] Mads Martin Joergensen
Heikki, you agree it's an innodb bug?
[3 May 2007 3:12] Shane Bester
slightly randomized testcase to cause corruptions.  Run for ~1 minute and observe errors logs of server...

Attachment: bug28138.c (text/plain), 6.06 KiB.

[3 May 2007 3:14] Shane Bester
my complete error log from the above testcase.

Attachment: www.zip (application/zip, text), 209.68 KiB.

[3 May 2007 12:29] James Day
Regression bug. Reproduced with 5.0.40 mysqld-nt which crashed but not with 5.0.27.
[3 May 2007 14:51] James Day
5.0.27 and 5.0.28 work fine. 5.0.30, 5.0.36, 5.0.40 crash with corruption found by check table, all tests with mysqld-nt.
[4 May 2007 13:09] Heikki Tuuri
Assign to Vasil.
[15 May 2007 19:55] James Day
I've updated the synopsis from "indexing column prefixes reports as false corruption" to "indexing column prefixes produces corruption in InnoDB".
[22 May 2007 13:06] Heikki Tuuri
Could this be the same bug: http://bugs.mysql.com/bug.php?id=28125
[23 May 2007 17:14] Heikki Tuuri
The bug probably found by Vasil and me:

rem0rec.c in 5.0:

/*************************************************************
Builds a new-style physical record out of a data tuple and
stores it beginning from the start of the given buffer. */
static
rec_t*
rec_convert_dtuple_to_rec_new(
/*==========================*/
                                /* out: pointer to the origin
                                of physical record */
        byte*           buf,    /* in: start address of the physical record */
        dict_index_t*   index,  /* in: record descriptor */
        dtuple_t*       dtuple) /* in: data tuple */
{
...

        for (i = 0; i < n_fields; i++) {
                field = dtuple_get_nth_field(dtuple, i);
                type = dfield_get_type(field);
...
                if (fixed_len) {
                        ut_ad(len == fixed_len);
                } else {
                        ut_ad(len <= dtype_get_len(type)
                                || dtype_get_mtype(type) == DATA_BLOB);
                        if (len < 128 || (dtype_get_len(type) < 256
                                && dtype_get_mtype(type) != DATA_BLOB)) {
                                *lens-- = (byte) len;

                                printf(
                                "Storing len to 1 byte: %lu\n", (ulint) len);

                        }
                        else {
                                /* the extern bits will be set later */
                                ut_ad(len < 16384);
                                *lens-- = (byte) (len >> 8) | 0x80;
                                *lens-- = (byte) len;

                                printf(
                                "Storing len to 2 bytes: %lu\n", (ulint) len);
                        }
                }

############ ABOVE WE LOOK AT the type of the dtuple field
############ BUT BELOW:

/**********************************************************
The following function determines the offsets to each field in the
record.  The offsets are written to a previously allocated array of
ulint, where rec_offs_n_fields(offsets) has been initialized to the
number of fields in the record.  The rest of the array will be
initialized by this function.  rec_offs_base(offsets)[0] will be set
to the extra size (if REC_OFFS_COMPACT is set, the record is in the
new format), and rec_offs_base(offsets)[1..n_fields] will be set to
offsets past the end of fields 0..n_fields, or to the beginning of
fields 1..n_fields+1.  When the high-order bit of the offset at [i+1]
is set (REC_OFFS_SQL_NULL), the field i is NULL.  When the second
high-order bit of the offset at [i+1] is set (REC_OFFS_EXTERNAL), the
field i is being stored externally. */
static
void
rec_init_offsets(
/*=============*/
                                /* out: the offsets */
        rec_t*          rec,    /* in: physical record */
        dict_index_t*   index,  /* in: record descriptor */
        ulint*          offsets)/* in/out: array of offsets;
                                in: n=rec_offs_n_fields(offsets) */
{
...
                        if (UNIV_UNLIKELY(!field->fixed_len)) {
                                /* Variable-length field: read the length */
                                dtype_t*        type = dict_col_get_type(
                                                dict_field_get_col(field));
                                len = *lens--;
                                if (UNIV_UNLIKELY(dtype_get_len(type) > 255)
                                    || UNIV_UNLIKELY(dtype_get_mtype(type)
                                                        == DATA_BLOB)) {
                                        if (len & 0x80) {
                                                /* 1exxxxxxx xxxxxxxx */
                                                len <<= 8;
                                                len |= *lens--;

                                                offs += len & 0x3fff;
                                                if (UNIV_UNLIKELY(len
                                                                & 0x4000)) {
                                                        len = offs
                                                        | REC_OFFS_EXTERNAL;
                                                } else {
                                                        len = offs;
                                                }

                                                goto resolved;
                                        }
                                }

                                len = offs += len;
...

########### WE LOOK AT THE TYPE OF THE COLUMN!

The 'length' of the type for the dtuple field may be different from the 'length' of the column type for column prefix indexes.

Are there other places in the code where we have the same bug?
[23 May 2007 17:32] Heikki Tuuri
I cannot repeat the bug with 5.0:

heikki@ws35:~/mysql-5.0/client$ ./mysql test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.38-debug-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> insert into pt values ("VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
    "> VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV");                    Query OK, 1 row affected (0.00 sec)

mysql> check table pt;
+---------+-------+----------+----------+
| Table   | Op    | Msg_type | Msg_text |
+---------+-------+----------+----------+
| test.pt | check | status   | OK       |
+---------+-------+----------+----------+
1 row in set (0.00 sec)

mysql>

heikki@ws35:~/mysql-5.0/sql$ ./mysqld
070523 20:31:06 [Warning] Changed limits: max_open_files: 1024  max_connections: 886  table_cache: 64
070523 20:31:06  InnoDB: Started; log sequence number 0 48188
070523 20:31:06 [Note] /home/heikki/mysql-5.0/sql/.libs/lt-mysqld: ready for connections.
Version: '5.0.38-debug-log'  socket: '/home/heikki/bugsocket'  port: 3307  Source distribution
dfield type len 6
dfield type len 6
dfield type len 7
dfield type len 256
dfield type len 256
dfield type len 6
Storing len to 2 bytes: 129
Storing len to 2 bytes: 128
[23 May 2007 17:37] Heikki Tuuri
Also:

mysql> CREATE TABLE `prefix_test` ( `lt` varbinary(4000) NOT NULL, KEY `lt_index`
    -> (`lt`(255))) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into prefix_test (lt) values
    -> ("alsdf;lkasdpfusssssssssssssssssffffffffffffffffffffffffffffffffkkkkkkkkkkkkkkkkkkkkkk2222222222222222222222222222225555555555555555555555555555555555999999999999999999999999999999999999999999999912123123123;1j23;lkj123;l1k2j31;l2kj31;2lk3j12;3lkj12;3lkj12;3lkj123;lkjapdoiufapsoiudfpaoisdufapsodiufaspdoifuaspdoifuaspdoifuasdpofiu098370987340198134123pi;klajdf;lakjad;fkasdf/mnasd/fnas.dmfnasdf/nasdf/masdf;lkajvzx;ckjvpiuyqweopuryqepoiruqwpeiruqwpeiruqwerha;dkfjasdfn/mzncv/mn/m,n/mn/:Cvjhoiuqepriuqpweiurqwepioruqwperiu07a;jasdkfasdnbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb3333333333333jjjjjjjjkkkkkkkkaaaaaaaa;kljad;fkljasd;fkjasd;lkfjasd;lkfjas;dlkfjas;dlkfjpiuqerpoiquwepriuqweriuqyweruytqweurytqwiuerygqwehrbqwlebrqclwebn kcvzxcmvz/xcmvzx/cvzx/cmvzxc    kljasd;kfjpoiu'qkjq'wekrjq'kerqwe[prqweiuyaua9840173407861234a;kjad;fjasdfasdfasl;dkfjas dflkjasdf as;dkfj as;dfjasdtyfaisudyf asdfiy asdif yasidufy asdif
    "> yasduf asdf;asd fasdf adf a asdfalsdfj asdfasdkfjpqieuyrqpiuweyr'adfj;adsfasdklfj asdfkjasdfa asdfasdf asdfj asdfjasldkfjasdf qqerjqwker qwer jqwer;kqwjerpqweur03847501234786510345b asdf");
Query OK, 1 row affected (0.01 sec)

mysql> check table prefix_test;
+------------------+-------+----------+----------+
| Table            | Op    | Msg_type | Msg_text |
+------------------+-------+----------+----------+
| test.prefix_test | check | status   | OK       |
+------------------+-------+----------+----------+
1 row in set (0.01 sec)

mysql>
[23 May 2007 17:59] Heikki Tuuri
This patch introduced the bug to 5.1:

Author: marko
Date: 2006-09-06 13:23:47 +0300 (Wed, 06 Sep 2006)
New Revision: 799

Log:
On data tuples being updated or inserted (but not searched for),
set type->len to the prefix_len of the index column, if it is a prefix index.
This should prevent bugs similar to Bug #21638 from occurring.

dict_index_copy_types(): Set type->len to prefix_len if prefix_len != 0.

row_build_index_entry(): Set type->len to prefix_len if prefix_len != 0,
also when the column in the tuple is SQL NULL, because the type information
may be used for interpreting other records during btr_page_reorganize().

----------------------

Modified:
   trunk/dict/dict0dict.c
   trunk/row/row0row.c

Modified: trunk/dict/dict0dict.c
===================================================================
--- trunk/dict/dict0dict.c	2006-09-05 20:41:05 UTC (rev 798)
+++ trunk/dict/dict0dict.c	2006-09-06 10:23:47 UTC (rev 799)
@@ -1723,8 +1723,6 @@
 	dict_index_t*	index,		/* in: index */
 	ulint		n_fields)	/* in: number of field types to copy */
 {
-	dtype_t*	dfield_type;
-	dtype_t*	type;
 	ulint		i;
 
 	if (UNIV_UNLIKELY(index->type & DICT_UNIVERSAL)) {
@@ -1734,10 +1732,17 @@
 	}
 
 	for (i = 0; i < n_fields; i++) {
+		dict_field_t*	ifield;
+		dtype_t*	dfield_type;
+		dtype_t*	type;
+
+		ifield = dict_index_get_nth_field(index, i);
 		dfield_type = dfield_get_type(dtuple_get_nth_field(tuple, i));
-		type = dict_col_get_type(dict_field_get_col
-					 (dict_index_get_nth_field(index, i)));
+		type = dict_col_get_type(dict_field_get_col(ifield));
 		*dfield_type = *type;
+		if (UNIV_UNLIKELY(ifield->prefix_len)) {
+			dfield_type->len = ifield->prefix_len;
+		}
 	}
 }
 

Modified: trunk/row/row0row.c
===================================================================
--- trunk/row/row0row.c	2006-09-05 20:41:05 UTC (rev 798)
+++ trunk/row/row0row.c	2006-09-06 10:23:47 UTC (rev 799)
@@ -143,17 +143,22 @@
 		dfield_copy(dfield, dfield2);
 
 		/* If a column prefix index, take only the prefix */
-		if (ind_field->prefix_len > 0
-		    && dfield_get_len(dfield2) != UNIV_SQL_NULL) {
+		if (ind_field->prefix_len) {
+			if (dfield_get_len(dfield2) != UNIV_SQL_NULL) {
 
-			cur_type = dict_col_get_type
-				(dict_field_get_col(ind_field));
+				cur_type = dict_col_get_type
+					(dict_field_get_col(ind_field));
 
-			storage_len = dtype_get_at_most_n_mbchars
-				(cur_type, ind_field->prefix_len,
-				 dfield_get_len(dfield2), dfield2->data);
+				storage_len = dtype_get_at_most_n_mbchars
+					(cur_type,
+					 ind_field->prefix_len,
+					 dfield_get_len(dfield2),
+					 dfield2->data);
 
-			dfield_set_len(dfield, storage_len);
+				dfield_set_len(dfield, storage_len);
+			}
+
+			dfield_get_type(dfield)->len = ind_field->prefix_len;
 		}
 	}
[23 May 2007 18:03] Heikki Tuuri
Assigning this to Marko. Are there also other places where we assume dfield type len == column type len? That is not true for column prefix indexes.
[24 May 2007 11:42] Marko Mäkelä
Using the attached bug28138.c, I can repeat this on 5.1 even after reverting r799. So I believe that it can be repeated with this test program on 5.0 as well.
[24 May 2007 11:49] Shane Bester
Marko, yes 5.0 is affected too.  I uploaded my 5.0.42 error logs in www.zip
[24 May 2007 12:42] Marko Mäkelä
We could reproduce the crash with 5.0 too. It looks like this bug was caused by the fix of Bug #21638. When we revert r791,799 from our 5.1 tree or r792 from our 5.0 tree, the test case bug28138.c will not trigger corruption or crash.
[24 May 2007 12:59] Marko Mäkelä
Revert Bug #21638 fix from 5.0

Attachment: 5.0-r792.patch (text/x-diff), 3.88 KiB.

[24 May 2007 13:02] Marko Mäkelä
Revert Bug #21638 fix from 5.1

Attachment: 5.1-r791-799.patch (text/x-diff), 5.36 KiB.

[12 Jul 2007 17:28] Timothy Smith
Queued to 5.0- and 5.1-maint team trees
[19 Jul 2007 15:47] Bugs System
Pushed into 5.1.21-beta
[19 Jul 2007 15:49] Bugs System
Pushed into 5.0.48
[22 Jul 2007 18:17] Paul Dubois
Noted in 5.0.48, 5.1.21 changelogs.

Indexing column prefixes in InnoDB tables could cause table
corruption.
[12 Sep 2007 23:55] James Day
A symptom of this bug can also be error log entries like this:

InnoDB: error in sec index entry update in
InnoDB: index `PrefixIndex` of table `dbname/Content`
InnoDB: tuple DATA TUPLE: 3 fields;

InnoDB crashing is also possible.

Since this causes tablespace corruption you should remove the affected index or replace with a full length version as soon as possible, or upgrade to a version with the bug fix. If you do see corruption, drop the index.
[5 May 2010 15:19] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 2:58] Paul Dubois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 5:54] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:23] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:51] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 2:51] Paul Dubois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[17 Jun 2010 11:55] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:33] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:21] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)