MySQL Bugs: #63196: set innodb_adaptive_hash

Bug #63196	set innodb_adaptive_hash_index=0 hangs server
Submitted:	11 Nov 2011 6:19	Modified:	17 Jul 2013 14:14
Reporter:	Mark Callaghan	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB Plugin storage engine	Severity:	S1 (Critical)
Version:	5.1.52	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	adaptive, hang, hash, innodb

Description:
This was done on a running server that was too slow with lots of stalls where it appeared that:
1) many threads were waiting to lock btr_search_latch
2) holder of btr_search_latch was waiting on other mutexes

So I took a chance and disabled the AHI. That didn't work out too well

This is the stack for the thread trying to disable the AHI

Thread 284 (Thread 0x5e1e5940 (LWP 7526)):
#0  0x0000003278c0a496 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000007aabf5 in os_event_wait_low (event=0x34841810, reset_sig_count=<value optimized out>) at os/os0sync.c:399
#2  0x00000000007ebaa0 in sync_array_wait_event (arr=0x1f474e20, index=11) at sync/sync0arr.c:451
#3  0x00000000007ec21a in rw_lock_s_lock_spin (lock=0x2aaabed26648, pass=<value optimized out>, file_name=0x9b8272 "btr/btr0sea.c", line=1244) at sync/sync0rw.c:436
#4  0x000000000082873a in rw_lock_s_lock_func (space=34, zip_size=8192, offset=263269, rw_latch=1, guess=0x0, mode=11, file=0x9b8272 "btr/btr0sea.c", line=1244, mtr=0x5e1e33c0) at ./incl
ude/sync0rw.ic:418
#5  buf_page_get_gen (space=34, zip_size=8192, offset=263269, rw_latch=1, guess=0x0, mode=11, file=0x9b8272 "btr/btr0sea.c", line=1244, mtr=0x5e1e33c0) at buf/buf0buf.c:2535
#6  0x000000000081f28d in btr_search_drop_page_hash_when_freed (space=34, zip_size=8192, page_no=263269) at btr/btr0sea.c:1243
#7  0x0000000000856895 in fseg_free_page_low (seg_inode=0x2aaaef21c272 "", space=34, zip_size=0, page=18446744073709551615, mtr=0x34841810) at fsp/fsp0fsp.c:3258
#8  0x0000000000856ea8 in fseg_free_page (seg_header=0x2aaaef25c04a "", space=34, page=263269, mtr=0x5e1e4590) at fsp/fsp0fsp.c:3395
#9  0x000000000080df20 in btr_compress (cursor=0x5e1e4000, mtr=0x5e1e4590) at btr/btr0btr.c:2810
#10 0x0000000000811ad3 in btr_cur_compress_if_useful (cursor=0x34841854, mtr=0xa) at btr/btr0cur.c:2769
#11 0x000000000081a2a5 in btr_cur_pessimistic_delete (err=0x5e1e40e8, has_reserved_extents=<value optimized out>, cursor=0x5e1e4000, rb_ctx=RB_NONE, mtr=0x5e1e4590) at btr/btr0cur.c:3012
#12 0x00000000007d2fe4 in row_purge_remove_sec_if_poss_low (node=<value optimized out>, index=0x2ab654065ab8, entry=0x2ab65819b748, mode=33) at row/row0purge.c:300
#13 0x00000000007d3ded in row_purge (thr=0x2ab689d4b0b0) at row/row0purge.c:335
#14 row_purge_step (thr=0x2ab689d4b0b0) at row/row0purge.c:700
#15 0x00000000007be64b in que_thr_step (thr=0x2ab689d4b0b0) at que/que0que.c:1248
#16 que_run_threads_low (thr=0x2ab689d4b0b0) at que/que0que.c:1306
#17 que_run_threads (thr=0x2ab689d4b0b0) at que/que0que.c:1343
#18 0x00000000007f0cff in trx_purge_worker (worker_id=<value optimized out>) at trx/trx0purge.c:1238
#19 0x00000000007e4f71 in srv_purge_worker_thread (arg=<value optimized out>) at srv/srv0srv.c:3587
#20 0x0000003278c062f7 in start_thread () from /lib64/libpthread.so.0
#21 0x00000032780d1e3d in clone () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()

In that case thread was at buf_page_get_gen in
btr_search_drop_page_hash_when_freed(
/*=================================*/
        ulint   space,          /*!< in: space id */
        ulint   zip_size,       /*!< in: compressed page size in bytes
                                or 0 for uncompressed pages */
        ulint   page_no)        /*!< in: page number */
{
        buf_block_t*    block;
        mtr_t           mtr;

        if (!buf_page_peek_if_search_hashed(space, page_no)) {

                return;
        }

        mtr_start(&mtr);

        /* We assume that if the caller has a latch on the page, then the
        caller has already dropped the hash index for the page, and we never
        get here. Therefore we can acquire the s-latch to the page without
        having to fear a deadlock. */

        block = buf_page_get_gen(space, zip_size, page_no, RW_S_LATCH, NULL,
                                BUF_GET_IF_IN_POOL, __FILE__, __LINE__,
                                &mtr);

I think it was really hung because a few minutes later this was on the same page number.

#6  0x000000000081f28d in btr_search_drop_page_hash_when_freed (space=34, zip_size=8192, page_no=263269) at btr/btr0sea.c:1243

How to repeat:
disable AHI on a busy server

And this is a summary of the other thread stacks

    38 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone,??
     11 pthread_cond_wait@@GLIBC_2.3.2,MYSQL_BIN_LOG::wait_for_update,mysql_binlog_send,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
     10 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,row_mysql_freeze_data_dictionary_func,row_undo,row_undo_step,que_thr_step,que_run_threads_low,que_run_threads,trx_general_rollback_for_mysql,row_mysql_handle_errors,row_insert_for_mysql,ha_innobase::write_row,handler::ha_write_row,write_record,mysql_insert,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
      6 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,btr_pcur_open_func,row_search_index_entry,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge_worker,srv_purge_worker_thread,start_thread,clone,??
      4 read,my_real_read,my_net_read,do_command,handle_one_connection,start_thread,clone,??
      1 select,os_thread_sleep,sync_array_print_long_waits,srv_error_monitor_thread,start_thread,clone,??
      1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone,??
      1 select,os_thread_sleep,srv_LRU_dump_restore_thread,start_thread,clone,??
      1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone,??
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_x_lock_low,rw_lock_x_lock_func,row_mysql_lock_data_dictionary_func,ha_innobase::info,get_schema_tables_record,get_all_tables,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_x_lock_func,buf_pool_drop_hash_index,btr_search_disable,sys_var_pluginvar::update,set_var::update,sql_set_variables,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,row_ins_index_entry_low,row_ins_index_entry,row_ins_index_entry_step,row_ins,row_ins_step,row_insert_for_mysql,ha_innobase::write_row,handler::ha_write_row,write_record,mysql_insert,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,btr_pcur_open_func,row_search_index_entry,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge,srv_purge_thread,start_thread,clone,??
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,buf_page_get_gen,btr_search_drop_page_hash_when_freed,fseg_free_page_low,fseg_free_page,btr_compress,btr_cur_compress_if_useful,btr_cur_pessimistic_delete,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge_worker,srv_purge_worker_thread,start_thread,clone,??
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,buf_flush_page,buf_flush_try_neighbors,buf_flush_batch,srv_master_thread,start_thread,clone,??
      1 pthread_cond_timedwait@@GLIBC_2.3.2,kill_server,kill_server_thread,start_thread,clone,??
      1 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,THD::init,THD::THD,handle_connections_sockets,main
      1 do_sigwait,sigwait,signal_hand,start_thread,clone,??

pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,btr_pcur_open_func,row_search_index_entry,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge_worker,srv_purge_worker_thread,start_thread,clone,??

This doesn't look like Oracle code:

  trx_purge_worker
  srv_purge_worker_thread

Sure it is. It just isn't 5.1.52 code. That is the innodb multi-threaded purge patch. I think it originated from one of your branches and then you redid it. Note that without this patch we might not be running mysql today.

Which branch, do you remember? The reason is that I don't remember writing a patch that
worked like this. All versions of MT purge have used the work queue. So, I'm curious.  All
worker threads always call trx_purge() and always have, I don't recall having a special function
trx_purge_worker().

You would have to ask Percona. I thought that an official branch (trunk or something) was the basis for their patch. I don't want to take credit away from them for code they might have originated though. But when I needed multi-threaded purge, their patch was much easier to port than the official fix in 5.6.

could this be related?

http://bugs.mysql.com/bug.php?id=59225
(Deadlock between DML and innodb_adaptive_hash_index=OFF)

ok, then it is not our patch. I can see why back porting the 5.6 patch is more difficult. The kernel
mutex split and code reorganisation would make it problematic. Though I don't think purge is
related to this bug.

i got a hang. check thread 23.

Attachment: bug63196_5.5.17_hang.txt (text/plain), 195.44 KiB.

thread apply all bt full

Attachment: bug63196_5.5.17_hang_thread_apply_all_bt_full.txt.zip (application/octet-stream, text), 49.28 KiB.

same testcase as above, with 5.5.13, also hangup after a while.

Attachment: gdb_5.5.13_hang_stacks.txt (text/plain), 201.64 KiB.

repeated a hang quicker on 5.1.59+plugin.... stacks.

Attachment: gdb_5.1.59_plugin_hang_stacks.txt (text/plain), 115.82 KiB.

this is not repeatable on mysql-trunk after 48 hours.

Apparently the bug is not repeatable on current versions due to the fix for http://bugs.mysql.com/bug.php?id=62487 having fixed this issue also.