| Bug #63196 | set innodb_adaptive_hash_index=0 hangs server | ||
|---|---|---|---|
| Submitted: | 11 Nov 2011 6:19 | Modified: | 17 Jul 2013 14:14 |
| Reporter: | Mark Callaghan | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S1 (Critical) |
| Version: | 5.1.52 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | adaptive, hang, hash, innodb | ||
[11 Nov 2011 6:19]
Mark Callaghan
And this is a summary of the other thread stacks
38 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone,??
11 pthread_cond_wait@@GLIBC_2.3.2,MYSQL_BIN_LOG::wait_for_update,mysql_binlog_send,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
10 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,row_mysql_freeze_data_dictionary_func,row_undo,row_undo_step,que_thr_step,que_run_threads_low,que_run_threads,trx_general_rollback_for_mysql,row_mysql_handle_errors,row_insert_for_mysql,ha_innobase::write_row,handler::ha_write_row,write_record,mysql_insert,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
6 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,btr_pcur_open_func,row_search_index_entry,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge_worker,srv_purge_worker_thread,start_thread,clone,??
4 read,my_real_read,my_net_read,do_command,handle_one_connection,start_thread,clone,??
1 select,os_thread_sleep,sync_array_print_long_waits,srv_error_monitor_thread,start_thread,clone,??
1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone,??
1 select,os_thread_sleep,srv_LRU_dump_restore_thread,start_thread,clone,??
1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_x_lock_low,rw_lock_x_lock_func,row_mysql_lock_data_dictionary_func,ha_innobase::info,get_schema_tables_record,get_all_tables,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_x_lock_func,buf_pool_drop_hash_index,btr_search_disable,sys_var_pluginvar::update,set_var::update,sql_set_variables,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,row_ins_index_entry_low,row_ins_index_entry,row_ins_index_entry_step,row_ins,row_ins_step,row_insert_for_mysql,ha_innobase::write_row,handler::ha_write_row,write_record,mysql_insert,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,btr_pcur_open_func,row_search_index_entry,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge,srv_purge_thread,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,buf_page_get_gen,btr_search_drop_page_hash_when_freed,fseg_free_page_low,fseg_free_page,btr_compress,btr_cur_compress_if_useful,btr_cur_pessimistic_delete,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge_worker,srv_purge_worker_thread,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,buf_flush_page,buf_flush_try_neighbors,buf_flush_batch,srv_master_thread,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,kill_server,kill_server_thread,start_thread,clone,??
1 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,THD::init,THD::THD,handle_connections_sockets,main
1 do_sigwait,sigwait,signal_hand,start_thread,clone,??
[11 Nov 2011 6:28]
Sunny Bains
pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,rw_lock_s_lock_func,mtr_s_lock_func,btr_cur_search_to_nth_level,btr_pcur_open_func,row_search_index_entry,row_purge_remove_sec_if_poss_low,row_purge,row_purge_step,que_thr_step,que_run_threads_low,que_run_threads,trx_purge_worker,srv_purge_worker_thread,start_thread,clone,?? This doesn't look like Oracle code: trx_purge_worker srv_purge_worker_thread
[11 Nov 2011 6:30]
Mark Callaghan
Sure it is. It just isn't 5.1.52 code. That is the innodb multi-threaded purge patch. I think it originated from one of your branches and then you redid it. Note that without this patch we might not be running mysql today.
[11 Nov 2011 6:36]
Sunny Bains
Which branch, do you remember? The reason is that I don't remember writing a patch that worked like this. All versions of MT purge have used the work queue. So, I'm curious. All worker threads always call trx_purge() and always have, I don't recall having a special function trx_purge_worker().
[11 Nov 2011 6:47]
Mark Callaghan
You would have to ask Percona. I thought that an official branch (trunk or something) was the basis for their patch. I don't want to take credit away from them for code they might have originated though. But when I needed multi-threaded purge, their patch was much easier to port than the official fix in 5.6.
[11 Nov 2011 6:52]
MySQL Verification Team
could this be related? http://bugs.mysql.com/bug.php?id=59225 (Deadlock between DML and innodb_adaptive_hash_index=OFF)
[11 Nov 2011 6:53]
Sunny Bains
ok, then it is not our patch. I can see why back porting the 5.6 patch is more difficult. The kernel mutex split and code reorganisation would make it problematic. Though I don't think purge is related to this bug.
[11 Nov 2011 8:55]
MySQL Verification Team
i got a hang. check thread 23.
Attachment: bug63196_5.5.17_hang.txt (text/plain), 195.44 KiB.
[11 Nov 2011 9:05]
MySQL Verification Team
thread apply all bt full
Attachment: bug63196_5.5.17_hang_thread_apply_all_bt_full.txt.zip (application/octet-stream, text), 49.28 KiB.
[11 Nov 2011 14:56]
MySQL Verification Team
same testcase as above, with 5.5.13, also hangup after a while.
Attachment: gdb_5.5.13_hang_stacks.txt (text/plain), 201.64 KiB.
[11 Nov 2011 15:11]
MySQL Verification Team
repeated a hang quicker on 5.1.59+plugin.... stacks.
Attachment: gdb_5.1.59_plugin_hang_stacks.txt (text/plain), 115.82 KiB.
[14 Nov 2011 4:50]
MySQL Verification Team
this is not repeatable on mysql-trunk after 48 hours.
[17 Jul 2013 14:14]
MySQL Verification Team
Apparently the bug is not repeatable on current versions due to the fix for http://bugs.mysql.com/bug.php?id=62487 having fixed this issue also.

Description: This was done on a running server that was too slow with lots of stalls where it appeared that: 1) many threads were waiting to lock btr_search_latch 2) holder of btr_search_latch was waiting on other mutexes So I took a chance and disabled the AHI. That didn't work out too well This is the stack for the thread trying to disable the AHI Thread 284 (Thread 0x5e1e5940 (LWP 7526)): #0 0x0000003278c0a496 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000007aabf5 in os_event_wait_low (event=0x34841810, reset_sig_count=<value optimized out>) at os/os0sync.c:399 #2 0x00000000007ebaa0 in sync_array_wait_event (arr=0x1f474e20, index=11) at sync/sync0arr.c:451 #3 0x00000000007ec21a in rw_lock_s_lock_spin (lock=0x2aaabed26648, pass=<value optimized out>, file_name=0x9b8272 "btr/btr0sea.c", line=1244) at sync/sync0rw.c:436 #4 0x000000000082873a in rw_lock_s_lock_func (space=34, zip_size=8192, offset=263269, rw_latch=1, guess=0x0, mode=11, file=0x9b8272 "btr/btr0sea.c", line=1244, mtr=0x5e1e33c0) at ./incl ude/sync0rw.ic:418 #5 buf_page_get_gen (space=34, zip_size=8192, offset=263269, rw_latch=1, guess=0x0, mode=11, file=0x9b8272 "btr/btr0sea.c", line=1244, mtr=0x5e1e33c0) at buf/buf0buf.c:2535 #6 0x000000000081f28d in btr_search_drop_page_hash_when_freed (space=34, zip_size=8192, page_no=263269) at btr/btr0sea.c:1243 #7 0x0000000000856895 in fseg_free_page_low (seg_inode=0x2aaaef21c272 "", space=34, zip_size=0, page=18446744073709551615, mtr=0x34841810) at fsp/fsp0fsp.c:3258 #8 0x0000000000856ea8 in fseg_free_page (seg_header=0x2aaaef25c04a "", space=34, page=263269, mtr=0x5e1e4590) at fsp/fsp0fsp.c:3395 #9 0x000000000080df20 in btr_compress (cursor=0x5e1e4000, mtr=0x5e1e4590) at btr/btr0btr.c:2810 #10 0x0000000000811ad3 in btr_cur_compress_if_useful (cursor=0x34841854, mtr=0xa) at btr/btr0cur.c:2769 #11 0x000000000081a2a5 in btr_cur_pessimistic_delete (err=0x5e1e40e8, has_reserved_extents=<value optimized out>, cursor=0x5e1e4000, rb_ctx=RB_NONE, mtr=0x5e1e4590) at btr/btr0cur.c:3012 #12 0x00000000007d2fe4 in row_purge_remove_sec_if_poss_low (node=<value optimized out>, index=0x2ab654065ab8, entry=0x2ab65819b748, mode=33) at row/row0purge.c:300 #13 0x00000000007d3ded in row_purge (thr=0x2ab689d4b0b0) at row/row0purge.c:335 #14 row_purge_step (thr=0x2ab689d4b0b0) at row/row0purge.c:700 #15 0x00000000007be64b in que_thr_step (thr=0x2ab689d4b0b0) at que/que0que.c:1248 #16 que_run_threads_low (thr=0x2ab689d4b0b0) at que/que0que.c:1306 #17 que_run_threads (thr=0x2ab689d4b0b0) at que/que0que.c:1343 #18 0x00000000007f0cff in trx_purge_worker (worker_id=<value optimized out>) at trx/trx0purge.c:1238 #19 0x00000000007e4f71 in srv_purge_worker_thread (arg=<value optimized out>) at srv/srv0srv.c:3587 #20 0x0000003278c062f7 in start_thread () from /lib64/libpthread.so.0 #21 0x00000032780d1e3d in clone () from /lib64/libc.so.6 #22 0x0000000000000000 in ?? () In that case thread was at buf_page_get_gen in btr_search_drop_page_hash_when_freed( /*=================================*/ ulint space, /*!< in: space id */ ulint zip_size, /*!< in: compressed page size in bytes or 0 for uncompressed pages */ ulint page_no) /*!< in: page number */ { buf_block_t* block; mtr_t mtr; if (!buf_page_peek_if_search_hashed(space, page_no)) { return; } mtr_start(&mtr); /* We assume that if the caller has a latch on the page, then the caller has already dropped the hash index for the page, and we never get here. Therefore we can acquire the s-latch to the page without having to fear a deadlock. */ block = buf_page_get_gen(space, zip_size, page_no, RW_S_LATCH, NULL, BUF_GET_IF_IN_POOL, __FILE__, __LINE__, &mtr); I think it was really hung because a few minutes later this was on the same page number. #6 0x000000000081f28d in btr_search_drop_page_hash_when_freed (space=34, zip_size=8192, page_no=263269) at btr/btr0sea.c:1243 How to repeat: disable AHI on a busy server