Bug #77401 | btr_search_drop_page_hash_index() accesses potentially freed dict_index_t | ||
---|---|---|---|
Submitted: | 18 Jun 2015 8:48 | Modified: | 12 Aug 2015 13:50 |
Reporter: | Marko Mäkelä | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.0.3 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[18 Jun 2015 8:48]
Marko Mäkelä
[18 Jun 2015 9:04]
Marko Mäkelä
Posted by developer: Workaround: --skip-innodb-adaptive-hash-index
[12 Aug 2015 13:47]
Marko Mäkelä
Posted by developer: I wanted to fix this perceived problem by introducing an optional parameter "bool hazard=false" to the function btr_search_drop_page_hash_index(). It would only be passed as hazard=true by buf_LRU_free_page(), which I verified to be the only caller that is not necessarily holding any MDL or table handle on the table that the being-freed page belongs to. To obtain proof that my patch is needed for correctness, I applied a simpler patch, adding some diagnostics to btr_search_drop_page_hash_index(): diff --git a/storage/innobase/btr/btr0sea.cc b/storage/innobase/btr/btr0sea.cc index 6aa0650..3452816 100644 --- a/storage/innobase/btr/btr0sea.cc +++ b/storage/innobase/btr/btr0sea.cc @@ -1218,6 +1218,15 @@ retry: || rw_lock_own(&block->lock, RW_LOCK_S) || rw_lock_own(&block->lock, RW_LOCK_X)); + if (index->table->n_ref_count == 0 && !mutex_own(&dict_sys->mutex)) { + fprintf(stderr, "InnoDB: Warning: hazard: %s\n", + index->table->name.m_name); + if (strstr(index->table->name.m_name, "test/t") || + strstr(index->table->name.m_name, "mysql/")) { + os_thread_sleep(100000000);/*100 seconds*/ + } + } + /* We must not dereference index here, because it could be freed if (index->table->n_ref_count == 0 && !mutex_own(&dict_sys->mutex)). Determine the ahi_slot based on the block contents. */ A variation of this instrumentation (replacing the sleep with an assertion failure) was triggered by some tests, such as these ones: innodb_zip.wl6347_comp_indx_stat [ pass ] 26504 innodb_zip.wl6470_2 [ fail ] Test ended at 2015-08-12 15:15:13 CURRENT_TEST: innodb_zip.wl6470_2 mysqltest: At line 245: query 'reap' failed: 2013: Lost connection to MySQL server during query The current statement is: update t5 set c1 = c1 + 1000 where c1 > 10; I was able to merge these two test files and reduce them to a single test case that runs with --manual-gdb, for a different statement: delete from t5 where c1 > 10 Both times, we try to evict AHI entries for the table mysql.innodb_index_stats. In the active gdb session, we have two active threads (the others are idle, waiting for user input or internal events): Thread 26: delete from t5 where c1 > 10 Thread 25: delete from t2 where c1 > 10 The stack trace of Thread 26 is like this: #2 btr_search_drop_page_hash_index #3 buf_LRU_free_page (bpage=0x7fffde6c1778, zip=true) #4 buf_LRU_free_from_common_LRU_list (buf_pool=0x2d799f8, scan_all=false) #5 buf_LRU_scan_and_free_block (buf_pool=0x2d799f8, scan_all=false) #6 buf_LRU_get_free_block (buf_pool=0x2d799f8) #7 buf_page_create #8 fsp_page_create #9 fsp_alloc_free_page (space=22, ...) #10 fseg_alloc_free_page_low #11 fseg_alloc_free_page_general #12 trx_undo_add_page Note: the space=22 corresponds to srv_tmp_space.m_space_id (allocating undo log for an operation on a TEMPORARY TABLE). It has nothing to do with the tablespace (14) of the table mysql.innodb_index_stats. Only the clustered index root page (14:3) has AHI entries. It is a single-page index with 7 entries. I let Thread 26 resume from the breakpoint that I had set at the sleep, and concurrently initiated a new connection like this: echo 'drop table mysql.innodb_index_stats;'| mysql --defaults-file=/dev/shm/t/mysql-test/var/my.cnf -A -uroot Then, the gdb session would show the following for the new Thread 27: #2 dict_index_remove_from_cache_low #3 dict_table_remove_from_cache_low #4 dict_table_remove_from_cache #5 row_drop_table_from_cache (tablename="mysql/innodb_index_stats",...) #6 row_drop_table_for_mysql (name="mysql/innodb_index_stats",...) #7 ha_innobase::delete_table (name="./mysql/innodb_index_stats") [New Thread 0x7fffde495700 (LWP 11029)] 2015-08-12T12:49:47.178869Z 5 [ERROR] InnoDB: Waited for 5 secs for hash index ref_count (1) to drop to 0. index: `PRIMARY` table: `mysql`.`innodb_index_stats` 2015-08-12T12:49:52.273839Z 5 [ERROR] InnoDB: Waited for 10 secs for hash index ref_count (1) to drop to 0. index: `PRIMARY` table: `mysql`.`innodb_index_stats` So, my reproduction attempt failed due to the above check in dict_index_remove_from_cache_low(). Let us check if it is possible (although improbable) that another thread invokes btr_search_drop_page_hash_index() on the same page (14:3) to make the dict_index_remove_from_cache_low() return. If we look at Thread 26, it is not holding any of the following: * MDL (otherwise, the DROP TABLE would not have reached ha_innobase::delete_table) * dict_sys->mutex * block->mutex (block->mutex.m_impl.is_owned() == false) * block->lock (block->lock->debug_list is empty) * block->page.buf_fix_count (it is 0) It would first seem that another thread could drop the AHI before our Thread 26 acquires the AHI latch for this slot. However, in this test scenario, we also have block->page.state==BUF_BLOCK_REMOVE_HASH and block->page.in_page_hash==false. The are set in buf_LRU_block_remove_hashed(). Now, the question is: Can buf_LRU_free_page() invoke btr_search_drop_page_hash_index() without first invoking buf_LRU_block_remove_hashed()? If not, we are safe. And the answer is indeed no, we are safe. (Clarification: buf_LRU_block_remove_hashed() makes the buf_block_t unaccessible to other threads by disconnecting it from the (space_id,page_no) lookup table and by changing its state to BUF_BLOCK_REMOVE_HASH.) So, it looks like this was an unfounded concern. I would not call it a wasted effort, because it was not obvious to me how this would work. It is hard to repeat this kind of scenarios in mysql-test-run (mtr), because mtr tests tend to be rather short, and they will always execute DROP TABLE to clean up after themselves. Therefore, there usually are not many big 'idle' tables in the buffer pool, and you do not tend to get this type of AHI eviction by buffer pool eviction; you would get it evicted by DROP TABLE or DROP DATABASE instead. In mtr, this type of scenario is basically only possible for shared predefined tables, such as data dictionary tables, the persistent statistics tables, or (theoretically) the GTID table if do not have too frequent updates to it.
[12 Aug 2015 14:10]
Marko Mäkelä
Posted by developer: It could seem that my analysis did not cover the scenario in the Description. The instrumentation that I introduced was at the start of the function btr_search_drop_page_hash_index(), before it acquires the AHI latch for the first time. The AHI latch would be released during the computation of the folds[] array, and reacquired after that. Most calls to btr_search_drop_page_hash_index() are protected by block->lock. The only exception is buf_LRU_free_page(), which would have disconnected the block from buf_page_get_gen() and invalidated the block state. Because of this, the only way to get multiple concurrent calls to btr_search_drop_page_hash_index() on the same block would be when each thread is holding block->lock S-latch. But, in this case the threads would also be holding an open table handle or MDL, preventing table metadata eviction and DROP TABLE.