| Bug #16582 | InnoDB: Error in an adaptive hash index pointer to page | ||
|---|---|---|---|
| Submitted: | 17 Jan 2006 21:35 | Modified: | 2 Feb 2006 18:40 |
| Reporter: | Heikki Tuuri | ||
| Status: | Closed | ||
| Category: | Server: InnoDB | Severity: | S2 (Serious) |
| Version: | All up to 5.0.18 | OS: | Any (all) |
| Assigned to: | Marko Mäkelä | Target Version: | |
[17 Jan 2006 21:35]
Heikki Tuuri
[18 Jan 2006 9:32]
Marko Mäkelä
Tried to reproduce with INSERTs of 1,000,000 rows followed by three UPDATEs of 333,333 rows each. No success yet.
[19 Jan 2006 9:40]
Marko Mäkelä
Got it: 1st node fold: 3303713354 = (0 210, 00000002 00003ab1) 2nd node fold: 3309575680 = (0 210, 00000002 00003a67) rec fold: 2862119956 = (0 210, 00000002 00003a) That is, the node fold values have been computed with n_fields=2, n_bytes=0, while the record fold values were computed with n_fields=1, n_bytes=3.
[19 Jan 2006 22:27]
Heikki Tuuri
Hi!
There is a race condition in btr_search_drop_page_hash_index() that could at least in
theory explain this. Suppose our thread has an S-latch on the page.
In that function we first store some values and release btr_search_latch:
"
n_fields = block->curr_n_fields;
n_bytes = block->curr_n_bytes;
index = block->index;
/* NOTE: The fields of block must not be accessed after
releasing btr_search_latch, as the index page might only
be s-latched! */
rw_lock_s_unlock(&btr_search_latch);
"
Then we calculate the 'fold' values based on these. But in the meantime, someone else may
come, drop the page hash index, and build a new one with DIFFERENT curr_n_fields etc.
Then our thread continues:
"
rw_lock_x_lock(&btr_search_latch);
for (i = 0; i < n_cached; i++) {
ha_remove_all_nodes_to_page(table, folds[i], page);
}
block->is_hashed = FALSE;
block->index = NULL;
...
"
It sets block->is_hashed = FALSE, even though it has NOT dropped the newly built hash
index on the page.
This inconsistency in the adaptive hash index can happen if both our thread and the other
thread are doing a SELECT query, and they happen to get different n_fields etc. in the
code fragment below. This is highly improbable unless the server is very loaded and
threads can stall for a long time.
"
if (build_index) {
/* Note that since we did not protect block->n_fields etc.
with any semaphore, the values can be inconsistent. We have
to check inside the function call that they make sense. We
also malloc an array and store the values there to make sure
the compiler does not let the function call parameters change
inside the called function. It might be that the compiler
would optimize the call just to pass pointers to block. */
params = mem_alloc(3 * sizeof(ulint));
params[0] = block->n_fields;
params[1] = block->n_bytes;
params[2] = block->side;
/* Make sure the compiler cannot deduce the values and do
optimizations */
params2 = params + btr_search_this_is_zero;
btr_search_build_page_hash_index(cursor->index,
block->frame,
params2[0],
params2[1],
params2[2]);
mem_free(params);
}
}
"
Regards,
Heikki
[19 Jan 2006 22:40]
Heikki Tuuri
Hi! Some analysis of the page 13473 hash corruption reported by the user. On Jan 14, the page was like this: about 480 records inserted in ascending order by a single transaction. No deleted records in the page record heap. There were dangling adaptive hash index pointers to every record, or almost every record on the page. Marko's analysis suggests they may have been hash nodes built based on 2 columns in the table, probably to do unique searches in the table, Since there was no evidence of any update etc. on the page, this suggests the bug is triggered by pure SELECT queries. The corruption magically healed itself in the next 12 hours. It moved to another page instead. This healing is easily explained if a new hash index was built on page 13473 again, this time with 2 columns. Then everything looked ok again, Then mysqld was restarted, but corruption soon reappeared to page 13473. The hex dump of page shows that new inserts had been made to it, and about 30 records from the end moved to a new page in a split. Regards, Heikki
[19 Jan 2006 23:27]
Heikki Tuuri
Fixed btr0sea.c for 5.0.18
Attachment: btr0sea.c (text/plain), 41.88 KiB.
[19 Jan 2006 23:28]
Heikki Tuuri
I have now attached btr0sea.c that I edited from 5.0.18. It contains a fix to the race condition I found, and also diagnostic prints if the corruption problem still persists. Regards, Heikki
[24 Jan 2006 12:15]
Marko Mäkelä
The bug has always been in MySQL/InnoDB, but it occurs very rarely. I will shortly post a patch to MySQL/InnoDB 4.0, in case a new 4.0 release will ever be built. We will definitely fix this for MySQL/InnoDB 4.1 and later versions.
[24 Jan 2006 12:20]
Marko Mäkelä
Apparently the bug reporting system is not transactional.
[24 Jan 2006 12:58]
Marko Mäkelä
Patch for 4.0 and 4.1
Attachment: bug16582-fix4.1.patch (text/x-patch), 2.50 KiB.
[24 Jan 2006 13:29]
Marko Mäkelä
Patch for 4.0 (the previous one only works for 4.1)
Attachment: bug16582-fix4.0.patch (text/x-patch), 1.52 KiB.
[30 Jan 2006 20:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/1889
[31 Jan 2006 13:28]
Alexander Ivanov
Pushed to 4.1.19.
[31 Jan 2006 19:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/1958
[31 Jan 2006 19:48]
Alexander Ivanov
Fixed in 5.0.19.
[2 Feb 2006 18:40]
Mike Hillyer
Documented in 4.1.19 and 5.1.19 changelogs:
<listitem>
<para>
Corrected race condition when dropping the adaptive
hash index for a B-tree page in InnoDB. (Bug #16582)
</para>
</listitem>
