MySQL Bugs: #16582: InnoDB: Error in an adaptive hash index pointer to page

Bug #16582	InnoDB: Error in an adaptive hash index pointer to page
Submitted:	17 Jan 2006 20:35	Modified:	19 Jun 2010 0:04
Reporter:	Heikki Tuuri	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	All up to 5.0.18	OS:	Any (all)
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
InnoDB prints to the .err log entries of the type:

"
060115 21:25:39 InnoDB: Error in an adaptive hash index pointer to page 
13473
ptr mem address 0x44262ee8 index id 0 210, node fold  yyyyyyyyyyy, rec 
fold zzzzzzzzzzzzzz
"

Table structure:

CREATE TABLE `t` (
`a` int(10) unsigned NOT NULL,
`b` int(10) unsigned NOT NULL,
PRIMARY KEY (`b`,`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1 
ROW_FORMAT=REDUNDANT 

How to repeat:
Do not know. Could try INSERTs  or UPDATEs to the table, where b has only a few distinct values, and we let column a values to grow one by one. Then InnoDB will build an adaptive hash index on the first column plus 3 bytes of the next column.

More diagnostics could be added to adaptive hash operations.

Suggested fix:
N/A

Tried to reproduce with INSERTs of 1,000,000 rows followed by three UPDATEs of 333,333 rows each. No success yet.

Got it:
1st node fold:
3303713354 = (0 210, 00000002 00003ab1)
2nd node fold:
3309575680 = (0 210, 00000002 00003a67)
rec fold:
2862119956 = (0 210, 00000002 00003a)

That is, the node fold values have been computed with n_fields=2, n_bytes=0, while the record fold values were computed with n_fields=1, n_bytes=3.

Hi!

There is a race condition in btr_search_drop_page_hash_index() that could at least in theory explain this. Suppose our thread has an S-latch on the page.

In that function we first store some values and release btr_search_latch:

"
        n_fields = block->curr_n_fields;
        n_bytes = block->curr_n_bytes;
        index = block->index;

        /* NOTE: The fields of block must not be accessed after
        releasing btr_search_latch, as the index page might only
        be s-latched! */

        rw_lock_s_unlock(&btr_search_latch);
"

Then we calculate the 'fold' values based on these. But in the meantime, someone else may come, drop the page hash index, and build a new one with DIFFERENT curr_n_fields etc.

Then our thread continues:

"
        rw_lock_x_lock(&btr_search_latch);

        for (i = 0; i < n_cached; i++) {

                ha_remove_all_nodes_to_page(table, folds[i], page);
        }

        block->is_hashed = FALSE;
        block->index = NULL;
...
"

It sets block->is_hashed = FALSE, even though it has NOT dropped the newly built hash index on the page.

This inconsistency in the adaptive hash index can happen if both our thread and the other thread are doing a SELECT query, and they happen to get different n_fields etc. in the code fragment below. This is highly improbable unless the server is very loaded and threads can stall for a long time.

"
        if (build_index) {
                /* Note that since we did not protect block->n_fields etc.
                with any semaphore, the values can be inconsistent. We have
                to check inside the function call that they make sense. We
                also malloc an array and store the values there to make sure
                the compiler does not let the function call parameters change
                inside the called function. It might be that the compiler
                would optimize the call just to pass pointers to block. */

                params = mem_alloc(3 * sizeof(ulint));
                params[0] = block->n_fields;
                params[1] = block->n_bytes;
                params[2] = block->side;

                /* Make sure the compiler cannot deduce the values and do
                optimizations */

                params2 = params + btr_search_this_is_zero;

                btr_search_build_page_hash_index(cursor->index,
                                                block->frame,
                                                params2[0],
                                                params2[1],
                                                params2[2]);
                mem_free(params);
        }
}
"

Regards,

Heikki

Hi!

Some analysis of the page 13473 hash corruption reported by the user.

On Jan 14, the page was like this: about 480 records inserted in ascending order by a single transaction. No deleted records in the page record heap. There were dangling adaptive hash index pointers to every record, or almost every record on the page. Marko's analysis suggests they may have been hash nodes built based on 2 columns in the table, probably to do unique searches in the table,

Since there was no evidence of any update etc. on the page, this suggests the bug is triggered by pure SELECT queries.

The corruption magically healed itself in the next 12 hours. It moved to another page instead. This healing is easily explained if a new hash index was built on page 13473 again, this time with 2 columns. Then everything looked ok again,

Then mysqld was restarted, but corruption soon reappeared to page 13473. The hex dump of page shows that new inserts had been made to it, and about 30 records from the end moved to a new page in a split.

Regards,

Heikki

Fixed btr0sea.c for 5.0.18

Attachment: btr0sea.c (text/plain), 41.88 KiB.

I have now attached btr0sea.c that I edited from 5.0.18. It contains a fix to the race condition I found, and also diagnostic prints if the corruption problem still persists.

Regards,

Heikki

The bug has always been in MySQL/InnoDB, but it occurs very rarely. I will shortly post a patch to MySQL/InnoDB 4.0, in case a new 4.0 release will ever be built. We will definitely fix this for MySQL/InnoDB 4.1 and later versions.

Apparently the bug reporting system is not transactional.

Patch for 4.0 and 4.1

Attachment: bug16582-fix4.1.patch (text/x-patch), 2.50 KiB.

Patch for 4.0 (the previous one only works for 4.1)

Attachment: bug16582-fix4.0.patch (text/x-patch), 1.52 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1889

Pushed to 4.1.19.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1958

Fixed in 5.0.19.

Documented in 4.1.19 and 5.1.19 changelogs:

      <listitem>
        <para>
          Corrected race condition when dropping the adaptive
          hash index for a B-tree page in InnoDB. (Bug #16582)
        </para>
      </listitem>

a python script to trigger the bug

Attachment: trigger.py (text/x-python), 2.26 KiB.

Hello, 

My name is Wei and I am 2nd year grad student at UW, Madison. I am doing some research on concurrency bug detection. I am trying to repeat the MySQL bug 16582 to better understand it. But I am not really sure how to trigger it.

I have created the database and its table as suggested:
CREATE TABLE `t` (
`a` int(10) unsigned NOT NULL,
`b` int(10) unsigned NOT NULL,
PRIMARY KEY (`b`,`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1 
ROW_FORMAT=REDUNDANT 

And also I have written the script to trigger the bug as suggested, I inserted 1 million records, of which a is from 0 to 999,999, and b is the mod of a and 3(i.e., 0,1,2). After the insertion, I would do 3 rounds of updating(update 333,333 records in each round, only leaving  the record (a = 999,999 b = 0) intact; in each round, a will be added with 1,000,000 while b stays the same.(i.e. a now is from 999,999 to 1,999,998 while ).  However, I haven't been able to trigger the bug. I am wondering could someone shed some light on how to repeat/trigger this bug? I read that the pure select query could trigger the bug and with a much smaller input size. I am wondering what that is supposed to look like? 

Thank you very much!

Wei

Btw, the attached file is the script I wrote to trigger the bug(but haven't been able to trigger it), just for the reference.

Wei, I believe that you will have to post suitable SELECT queries to the server. Otherwise it will not build any adaptive hash index. A hash index may be built in btr_search_build_page_hash_index() if btr_search_update_block_hash_info() returns TRUE to btr_search_info_update_slow().

Hi Marko,

Thank you very much for your response! 

After examining the code for a while. I created the database as Heikki suggested:
CREATE TABLE `t` (
`a` int(10) unsigned NOT NULL,
`b` int(10) unsigned NOT NULL,
PRIMARY KEY (`b`,`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1 
ROW_FORMAT=REDUNDANT 
And a is from 0 to 999,999 in the table, whereas b is a % 3 (so b alternates among 0,1,2) 

I set up two threads doing "SELECT" queries. Each thread will issue 1,000,000 unique queries to select one single record from the table until the end of the table ( i.e. select * from t where a=0,b=0 ... select * from t where a= 999,999 b= 0). In the meantime, there is another thread that does "check table testdb.t" (to force the code path which could lead to printing out the error message).  I also added a sleep before the block->is_hashed = FALSE inside  btr_search_drop_page_hash_index() function, in the hope that some other queries from another thread may get in , build up index, and set the "TRUE" value on the block->is_hashed, which be overwritten by  "block->is_hashed = FALSE" later in this thread.

In gdb, I found out that the "select" queries I posed did generate some adaptive indices.(the code path is like what Marko has suggested) However, after the intensive input(each thread posts 1 million select queries, while there are 500,000 check table commands issued by the third thread), the bug still _cannot_ be triggered. I realized there is a BTR_SEARCH_BUILD_LIMIT parameter that has something to do with building the hash index. Not exactly knowing the hash(folding) mechanism, I also tried to change this value to be 1, in the hope that more indices might be generated, however, I still couldn't succeed.

I am wondering could you educate me a little more about the adaptive hash index mechanism ? Preferably, under what circumstances, this particular bug would be easier to trigger ?

Thank you very much!

Wei

P.S.:
The below stacktrace is the one I got when doing select. If I understand correctly, in btr_search_drop_page_hash_index() the block->is_hashed is set to FALSE, and this could race with another thread(following the same stack trace) setting such value to TRUE in the btr_search_build_page_hash_index() function. I am not sure if I am right ?

#0  btr_search_drop_page_hash_index (page=0xb7614000 "\215R") at btr0sea.c:1007
#1  0x082d3d6e in btr_search_build_page_hash_index (index=0xb7e1a168, page=0xb7614000 "\215R", n_fields=1, n_bytes=0, side=2) at btr0sea.c:1108
#2  0x082d161b in btr_search_info_update_slow (info=0xb7e1a538, cursor=0xb4a4e6b0) at btr0sea.c:515
#3  0x082c4dfa in btr_cur_search_to_nth_level (index=0xb7e1a168, level=0, tuple=0xb7e15268, mode=4, latch_mode=2, cursor=0xb4a4e6b0, has_search_latch=0, mtr=0xb4a4e250) at ./../include/btr0sea.ic:66
#4  0x08298271 in row_ins_index_entry_low (mode=2, index=0xb7e1a168, entry=0xb7e15268, ext_vec=0x0, n_ext_vec=0, thr=0xb7e1cd80) at row0ins.c:1966
#5  0x082987f0 in row_ins_index_entry (index=0xb7e1a168, entry=0xb7e15268, ext_vec=0x0, n_ext_vec=0, thr=0xb7e1cd80) at row0ins.c:2135
#6  0x082988ce in row_ins_index_entry_step (node=0xb7e1cb88, thr=0xb7e1cd80) at row0ins.c:2214
#7  0x08298afc in row_ins (node=0xb7e1cb88, thr=0xb7e1cd80) at row0ins.c:2346
#8  0x08298c3b in row_ins_step (thr=0xb7e1cd80) at row0ins.c:2450
#9  0x0829a199 in row_insert_for_mysql (mysql_rec=0xb1007b78 "]\004", prebuilt=0xb7e1b068) at row0mysql.c:1141
#10 0x0820df0a in ha_innobase::write_row (this=0xb1007a50, record=0xb1007b78 "]\004") at ha_innodb.cc:3431
#11 0x081b9ec2 in write_record (thd=0xb1000468, table=0xb1007438, info=0xb4a4ea50) at sql_insert.cc:1146
#12 0x081b8dc6 in mysql_insert (thd=0xb1000468, table_list=0x8ffab70, fields=@0xb1000920, values_list=@0xb1000944, update_fields=@0xb1000938, update_values=@0xb100092c, duplic=DUP_ERROR, ignore=false) at sql_insert.cc:531
#13 0x0816a40c in mysql_execute_command (thd=0xb1000468) at sql_parse.cc:3266
#14 0x0816f9f8 in mysql_parse (thd=0xb1000468, inBuf=0x8ffaae0 "insert into testdb.t values (1117, 1)", length=37) at sql_parse.cc:5593
#15 0x08166d73 in dispatch_command (command=COM_QUERY, thd=0xb1000468, packet=0xb1003fe1 "insert into testdb.t values (1117, 1)", packet_length=38) at sql_parse.cc:1709
#16 0x08166715 in do_command (thd=0xb1000468) at sql_parse.cc:1510
#17 0x081659c3 in handle_one_connection (arg=0xb1000468) at sql_parse.cc:1155
#18 0x00d9973b in start_thread () from /lib/libpthread.so.0
#19 0x00ceecfe in clone () from /lib/libc.so.6

Hi Marko,

I have a question: I am curious about what blocks does btr_search_validate will look at? Is there some certain logic to guarantee that btr_search_validate() could look at a specific block ?

The following is some of my reasoning:

After adding some control flow and sleep, I did see that the block->is_hashed can be changed by the other thread between two atomic blocks. Now, to trigger the bug is to make sure the btr_search_validate() is being called by the other thread. I realized there were two ways to call btr_search_validate(), one way is to issue the query "check table xxx", the other way is the server's automatic trx_purge() per 10 seconds.

I changed the code so that a thread constantly sending a "check table xxx" query while the server issues trx_purge every 0.01 seconds. In the meantime, whenever the block->is_hashed is false(in the region where no btr_search_latch is held), I sleep for a second. However, it appears that in the btr_search_validate() routine, it couldn't get this block. So here comes my question: Is there some certain logic to guarantee that btr_search_validate() could look at a specific block ?

Thank you very much!

Wei

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)