Bug #16582 InnoDB: Error in an adaptive hash index pointer to page
Submitted: 17 Jan 2006 21:35 Modified: 2 Feb 2006 18:40
Reporter: Heikki Tuuri
Status: Closed
Category:Server: InnoDB Severity:S2 (Serious)
Version:All up to 5.0.18 OS:Any (all)
Assigned to: Marko Mäkelä Target Version:

[17 Jan 2006 21:35] Heikki Tuuri
Description:
InnoDB prints to the .err log entries of the type:

"
060115 21:25:39 InnoDB: Error in an adaptive hash index pointer to page 
13473
ptr mem address 0x44262ee8 index id 0 210, node fold  yyyyyyyyyyy, rec 
fold zzzzzzzzzzzzzz
"

Table structure:

CREATE TABLE `t` (
`a` int(10) unsigned NOT NULL,
`b` int(10) unsigned NOT NULL,
PRIMARY KEY (`b`,`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1 
ROW_FORMAT=REDUNDANT 

How to repeat:
Do not know. Could try INSERTs  or UPDATEs to the table, where b has only a few distinct
values, and we let column a values to grow one by one. Then InnoDB will build an adaptive
hash index on the first column plus 3 bytes of the next column.

More diagnostics could be added to adaptive hash operations.

Suggested fix:
N/A
[18 Jan 2006 9:32] Marko Mäkelä
Tried to reproduce with INSERTs of 1,000,000 rows followed by three UPDATEs of 333,333
rows each. No success yet.
[19 Jan 2006 9:40] Marko Mäkelä
Got it:
1st node fold:
3303713354 = (0 210, 00000002 00003ab1)
2nd node fold:
3309575680 = (0 210, 00000002 00003a67)
rec fold:
2862119956 = (0 210, 00000002 00003a)

That is, the node fold values have been computed with n_fields=2, n_bytes=0, while the
record fold values were computed with n_fields=1, n_bytes=3.
[19 Jan 2006 22:27] Heikki Tuuri
Hi!

There is a race condition in btr_search_drop_page_hash_index() that could at least in
theory explain this. Suppose our thread has an S-latch on the page.

In that function we first store some values and release btr_search_latch:

"
        n_fields = block->curr_n_fields;
        n_bytes = block->curr_n_bytes;
        index = block->index;

        /* NOTE: The fields of block must not be accessed after
        releasing btr_search_latch, as the index page might only
        be s-latched! */

        rw_lock_s_unlock(&btr_search_latch);
"

Then we calculate the 'fold' values based on these. But in the meantime, someone else may
come, drop the page hash index, and build a new one with DIFFERENT curr_n_fields etc.

Then our thread continues:

"
        rw_lock_x_lock(&btr_search_latch);

        for (i = 0; i < n_cached; i++) {

                ha_remove_all_nodes_to_page(table, folds[i], page);
        }

        block->is_hashed = FALSE;
        block->index = NULL;
...
"

It sets block->is_hashed = FALSE, even though it has NOT dropped the newly built hash
index on the page.

This inconsistency in the adaptive hash index can happen if both our thread and the other
thread are doing a SELECT query, and they happen to get different n_fields etc. in the
code fragment below. This is highly improbable unless the server is very loaded and
threads can stall for a long time.

"
        if (build_index) {
                /* Note that since we did not protect block->n_fields etc.
                with any semaphore, the values can be inconsistent. We have
                to check inside the function call that they make sense. We
                also malloc an array and store the values there to make sure
                the compiler does not let the function call parameters change
                inside the called function. It might be that the compiler
                would optimize the call just to pass pointers to block. */

                params = mem_alloc(3 * sizeof(ulint));
                params[0] = block->n_fields;
                params[1] = block->n_bytes;
                params[2] = block->side;

                /* Make sure the compiler cannot deduce the values and do
                optimizations */

                params2 = params + btr_search_this_is_zero;

                btr_search_build_page_hash_index(cursor->index,
                                                block->frame,
                                                params2[0],
                                                params2[1],
                                                params2[2]);
                mem_free(params);
        }
}
"

Regards,

Heikki
[19 Jan 2006 22:40] Heikki Tuuri
Hi!

Some analysis of the page 13473 hash corruption reported by the user.

On Jan 14, the page was like this: about 480 records inserted in ascending order by a
single transaction. No deleted records in the page record heap. There were dangling
adaptive hash index pointers to every record, or almost every record on the page. Marko's
analysis suggests they may have been hash nodes built based on 2 columns in the table,
probably to do unique searches in the table,

Since there was no evidence of any update etc. on the page, this suggests the bug is
triggered by pure SELECT queries.

The corruption magically healed itself in the next 12 hours. It moved to another page
instead. This healing is easily explained if a new hash index was built on page 13473
again, this time with 2 columns. Then everything looked ok again,

Then mysqld was restarted, but corruption soon reappeared to page 13473. The hex dump of
page shows that new inserts had been made to it, and about 30 records from the end moved
to a new page in a split.

Regards,

Heikki
[19 Jan 2006 23:27] Heikki Tuuri
Fixed btr0sea.c for 5.0.18

Attachment: btr0sea.c (text/plain), 41.88 KiB.

[19 Jan 2006 23:28] Heikki Tuuri
I have now attached btr0sea.c that I edited from 5.0.18. It contains a fix to the race
condition I found, and also diagnostic prints if the corruption problem still persists.

Regards,

Heikki
[24 Jan 2006 12:15] Marko Mäkelä
The bug has always been in MySQL/InnoDB, but it occurs very rarely. I will shortly post a
patch to MySQL/InnoDB 4.0, in case a new 4.0 release will ever be built. We will
definitely fix this for MySQL/InnoDB 4.1 and later versions.
[24 Jan 2006 12:20] Marko Mäkelä
Apparently the bug reporting system is not transactional.
[24 Jan 2006 12:58] Marko Mäkelä
Patch for 4.0 and 4.1

Attachment: bug16582-fix4.1.patch (text/x-patch), 2.50 KiB.

[24 Jan 2006 13:29] Marko Mäkelä
Patch for 4.0 (the previous one only works for 4.1)

Attachment: bug16582-fix4.0.patch (text/x-patch), 1.52 KiB.

[30 Jan 2006 20:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1889
[31 Jan 2006 13:28] Alexander Ivanov
Pushed to 4.1.19.
[31 Jan 2006 19:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1958
[31 Jan 2006 19:48] Alexander Ivanov
Fixed in 5.0.19.
[2 Feb 2006 18:40] Mike Hillyer
Documented in 4.1.19 and 5.1.19 changelogs:

      <listitem>
        <para>
          Corrected race condition when dropping the adaptive
          hash index for a B-tree page in InnoDB. (Bug #16582)
        </para>
      </listitem>