Bug #47711 | infinite hang and 100% cpu usage with concurrent engine=merge alters | ||
---|---|---|---|
Submitted: | 29 Sep 2009 12:20 | Modified: | 29 Sep 2009 12:58 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Merge storage engine | Severity: | S1 (Critical) |
Version: | 5.0.85, 5.1.39 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[29 Sep 2009 12:20]
Shane Bester
[29 Sep 2009 12:26]
MySQL Verification Team
thread stacks of the 1 hung and 1 looping thread.
Attachment: bug47711_thread_stacks.txt (text/plain), 6.09 KiB.
[29 Sep 2009 12:58]
MySQL Verification Team
Thank you for the bug report. Verified as described on Ubuntu 64-bit.
[21 Oct 2009 15:23]
Martin Dimitrov
I was also able to reproduce this bug, and tried to do some debugging as well. I printed out a trace, showing that the two threads are stuck in an infinite loop in function remove_table_from_cache(). Thread 1 9933972 T@-1264641: | | | | | | | | | | | | >hash_first 9933973 T@-1264641: | | | | | | | | | | | | | exit: found key at 6 9933974 T@-1264641: | | | | | | | | | | | | <hash_first 9933975 T@-1264641: | | | | | | | | | | | | info: Found another active instance of the table 9933976 T@-1264641: | | | | | | | | | | | | >mysql_lock_abort_for_thread 9933977 T@-1264641: | | | | | | | | | | | | | >get_lock_data 9933978 T@-1264641: | | | | | | | | | | | | | | >my_malloc 9933979 T@-1264641: | | | | | | | | | | | | | | | my: size: 20 my_flags: 0 9933980 T@-1264641: | | | | | | | | | | | | | | | exit: ptr: 0x9c0fa48 9933981 T@-1264641: | | | | | | | | | | | | | | <my_malloc 9933982 T@-1264641: | | | | | | | | | | | | | <get_lock_data 9933983 T@-1264641: | | | | | | | | | | | | | >my_free 9933984 T@-1264641: | | | | | | | | | | | | | | my: ptr: 0x9c0fa48 9933985 T@-1264641: | | | | | | | | | | | | | <my_free 9933986 T@-1264641: | | | | | | | | | | | | <mysql_lock_abort_for_thread 9933987 T@-1264641: | | | | | | | | | | | | info: Waiting for refresh signal Thread 2 9933988 T@-1264837: | | | | | | | | | | | | >hash_first 9933989 T@-1264837: | | | | | | | | | | | | | exit: found key at 6 9933990 T@-1264837: | | | | | | | | | | | | <hash_first 9933991 T@-1264837: | | | | | | | | | | | | info: Found another active instance of the table 9933992 T@-1264837: | | | | | | | | | | | | >mysql_lock_abort_for_thread 9933993 T@-1264837: | | | | | | | | | | | | | >get_lock_data 9933994 T@-1264837: | | | | | | | | | | | | | | >my_malloc 9933995 T@-1264837: | | | | | | | | | | | | | | | my: size: 20 my_flags: 0 9933996 T@-1264837: | | | | | | | | | | | | | | | exit: ptr: 0x9c0fa48 9933997 T@-1264837: | | | | | | | | | | | | | | <my_malloc 9933998 T@-1264837: | | | | | | | | | | | | | <get_lock_data 9933999 T@-1264837: | | | | | | | | | | | | | >my_free 9934000 T@-1264837: | | | | | | | | | | | | | | my: ptr: 0x9c0fa48 9934001 T@-1264837: | | | | | | | | | | | | | <my_free 9934002 T@-1264837: | | | | | | | | | | | | <mysql_lock_abort_for_thread 9934003 T@-1264837: | | | | | | | | | | | | info: Waiting for refresh signal And this repeats into infinity... In the PREREQUISITES for remove_table_from_cache(), it is mentioned that the thread must own LOCK_open upon entering this function. Thus I became curious how come both threads are stuck in this function if they need the mutex LOCK_open. Interestingly, around line sql_base.cc:6197 we release the LOCK_open lock, thus allowing the other thread to advance (at this time the other thread was waiting for LOCK_open in function mysql_create_table(), see my stack trace below). 6196 set_timespec(abstime, 10); 6197 pthread_cond_timedwait(&COND_refresh, &LOCK_open, &abstime); This was confirmed using GDB, which shows that both threads alternate holding LOCK_open, each one releasing it after line 6197 and allowing the other one to grab it. Hardware watchpoint 7: LOCK_open->thread Old value = 3031100304 New value = 3030903696 safe_cond_timedwait (cond=0x8665160, mp=0x8664c80, abstime=0xb4a7b0f0, file=0x84d4700 "sql_base.cc", line=6197) at thr_mutex.c:284 ------------- stack traces: [Switching to Thread -1265439856 (LWP 31561)] Breakpoint 1, remove_table_from_cache (thd=0xa4f4140, db=0xa4c3360 "test", table_name=0xa4c31d0 "t1", flags=2) at sql_base.cc:6139 6139 DBUG_PRINT("info", ("Found another active instance of the table")); (gdb) bt #0 remove_table_from_cache (thd=0xa4f4140, db=0xa4c3360 "test", table_name=0xa4c31d0 "t1", flags=2) at sql_base.cc:6139 #1 0x081dc5fe in wait_while_table_is_used (thd=0xa4f4140, table=0xa4c49f0, function=HA_EXTRA_PREPARE_FOR_DELETE) at sql_table.cc:1969 #2 0x081dcc11 in close_cached_table (thd=0xa4f4140, table=0xa4c49f0) at sql_table.cc:1994 #3 0x081e4759 in mysql_alter_table (thd=0xa4f4140, new_db=0xa510f80 "test", new_name=0xa5121a8 "t1", create_info=0xb492c85c, table_list=0xa5121d0, alter_info=0xb492c8dc, order_num=0, order=0x0, ignore=false) at sql_table.cc:3985 #4 0x0810547a in mysql_execute_command (thd=0xa4f4140) at sql_parse.cc:3547 #5 0x0823b601 in sp_instr_stmt::exec_core (this=0xa512370, thd=0xa4f4140, nextp=0xb492ce10) at sp_head.cc:2698 #6 0x0823b7d4 in sp_lex_keeper::reset_lex_and_exec_core (this=0xa512398, thd=0xa4f4140, nextp=0xb492ce10, open_tables=false, instr=0xa512370) at sp_head.cc:2543 #7 0x08242b7a in sp_instr_stmt::execute (this=0xa512370, thd=0xa4f4140, nextp=0xb492ce10) at sp_head.cc:2651 #8 0x08240210 in sp_head::execute (this=0xa510ca0, thd=0xa4f4140) at sp_head.cc:1154 #9 0x08240b74 in sp_head::execute_procedure (this=0xa510ca0, thd=0xa4f4140, args=0xa4f533c) at sp_head.cc:1815 #10 0x08106387 in mysql_execute_command (thd=0xa4f4140) at sql_parse.cc:4949 #11 0x0810832a in mysql_parse (thd=0xa4f4140, inBuf=0xa4fd658 "call p1(10)", length=11, found_semicolon=0xb492dfa8) at sql_parse.cc:6437 #12 0x08108961 in dispatch_command (command=COM_QUERY, thd=0xa4f4140, packet=0xa4f5629 "call p1(10)", packet_length=12) at sql_parse.cc:1959 #13 0x0810aa0d in handle_one_connection (arg=0xa4f4140) at sql_parse.cc:1640 #14 0x083d76da in start_thread () #15 0x0841418e in clone () (gdb) info threads * 13 Thread -1265439856 (LWP 31561) remove_table_from_cache (thd=0xa4f4140, db=0xa4c3360 "test", table_name=0xa4c31d0 "t1", flags=2) at sql_base.cc:6139 12 Thread -1265243248 (LWP 31559) 0x083dae24 in __lll_lock_wait () (gdb) thread 12 [Switching to thread 12 (Thread -1265243248 (LWP 31559))]#0 0x083dae24 in __lll_lock_wait () (gdb) bt #0 0x083dae24 in __lll_lock_wait () #1 0x083d8ef5 in _L_lock_836 () #2 0x083d8e36 in pthread_mutex_lock () #3 0x083adb65 in safe_mutex_lock (mp=0x8664c80, try_lock=0 '\0', file=0x84eb5c0 "sql_table.cc", line=1739) at thr_mutex.c:149 #4 0x081e07ad in mysql_create_table (thd=0xa4b9b50, db=0xa503028 "test", table_name=0xb495beec "#sql-7b38_1", create_info=0xb495c85c, alter_info=0xb495bf3c, internal_tmp_table=true, select_field_count=0) at sql_table.cc:1739 #5 0x081e40c6 in mysql_alter_table (thd=0xa4b9b50, new_db=0xa503028 "test", new_name=0xa504250 "t1", create_info=0xb495c85c, table_list=0xa504278, alter_info=0xb495c8dc, order_num=0, order=0x0, ignore=false) at sql_table.cc:3786 #6 0x0810547a in mysql_execute_command (thd=0xa4b9b50) at sql_parse.cc:3547 #7 0x0823b601 in sp_instr_stmt::exec_core (this=0xa504418, thd=0xa4b9b50, nextp=0xb495ce10) at sp_head.cc:2698 #8 0x0823b7d4 in sp_lex_keeper::reset_lex_and_exec_core (this=0xa504440, thd=0xa4b9b50, nextp=0xb495ce10, open_tables=false, instr=0xa504418) at sp_head.cc:2543 #9 0x08242b7a in sp_instr_stmt::execute (this=0xa504418, thd=0xa4b9b50, nextp=0xb495ce10) at sp_head.cc:2651 #10 0x08240210 in sp_head::execute (this=0xa502d48, thd=0xa4b9b50) at sp_head.cc:1154 #11 0x08240b74 in sp_head::execute_procedure (this=0xa502d48, thd=0xa4b9b50, args=0xa4bad4c) at sp_head.cc:1815 #12 0x08106387 in mysql_execute_command (thd=0xa4b9b50) at sql_parse.cc:4949 #13 0x0810832a in mysql_parse (thd=0xa4b9b50, inBuf=0xa4f2138 "call p1(1000)", length=13, found_semicolon=0xb495dfa8) at sql_parse.cc:6437 #14 0x08108961 in dispatch_command (command=COM_QUERY, thd=0xa4b9b50, packet=0xa4ea109 "", packet_length=14) at sql_parse.cc:1959 #15 0x0810aa0d in handle_one_connection (arg=0xa4b9b50) at sql_parse.cc:1640 #16 0x083d76da in start_thread () #17 0x0841418e in clone () --------------------- I hope this additional information helps. I am not very familiar with this code, and I still cannot understand the rest of this problem.