Bug #39193 | Deadlock in recover_from_failed_open_table_attempt() | ||
---|---|---|---|
Submitted: | 2 Sep 2008 16:10 | Modified: | 2 Jul 2009 15:40 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 6.0-bzr, 5.4 | OS: | Any |
Assigned to: | Jon Olav Hauglid | CPU Architecture: | Any |
[2 Sep 2008 16:10]
Philip Stoev
[2 Sep 2008 16:11]
Philip Stoev
Thread stacks for bug 39193
Attachment: bug39193.threads (text/plain), 33.08 KiB.
[2 Sep 2008 16:11]
Philip Stoev
mysql> show engine innodb status\G *************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 080902 19:10:41 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 3 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 125, signal count 119 Mutex spin waits 0, rounds 15924, OS waits 73 RW-shared spins 49, OS waits 22; RW-excl spins 48, OS waits 19 ------------ TRANSACTIONS ------------ Trx id counter 0 5085 Purge done for trx's n:o < 0 5048 undo n:o < 0 0 History list length 9 Total number of lock structs in row lock hash table 18 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, process no 15193, OS thread id 2798889872 MySQL thread id 72, query id 10792 localhost 127.0.0.1 root show engine innodb status ---TRANSACTION 0 0, not started, process no 15193, OS thread id 2818427792 MySQL thread id 67, query id 10736 localhost root Waiting for table LOCK TABLES `inter1` READ /*!32311 LOCAL */ ---TRANSACTION 0 5069, not started, process no 15193, OS thread id 2816691088 MySQL thread id 53, query id 10710 localhost root Waiting for table CREATE TABLE IF NOT EXISTS interact_xml.inter1 ENGINE = Falcon SELECT * FROM test.inter1 LIMIT 0 ---TRANSACTION 0 5064, not started, process no 15193, OS thread id 2818227088 MySQL thread id 48, query id 10672 localhost root ---TRANSACTION 0 0, not started, process no 15193, OS thread id 2816490384 mysql tables in use 1, locked 1 MySQL thread id 43, query id 10653 localhost root Table lock ALTER TABLE inter1 ADD KEY k1 (t1_uuid) ---TRANSACTION 0 4994, not started, process no 15193, OS thread id 2799491984 mysql tables in use 1, locked 1 MySQL thread id 42, query id 10570 localhost root Table lock ALTER TABLE inter1 CHANGE t1_uuid t1_uuid VARCHAR(36) ---TRANSACTION 0 5078, ACTIVE 15 sec, process no 15193, OS thread id 2799692688 fetching rows mysql tables in use 2, locked 2 LOCK WAIT 13 lock struct(s), heap size 1024, 2011 row lock(s), undo log entries 2001 MySQL thread id 47, query id 10552 localhost root copy to tmp table ALTER TABLE inter1 ENGINE = Innodb ------- TRX HAS BEEN WAITING 15 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 268 n bits 168 index `PRIMARY` of table `test`.`#mysql50#inter1#P#p0` trx id 0 5078 lock_mode X waiting Record lock, heap no 92 PHYSICAL RECORD: n_fields 7; compact format; info bits 32 0: len 4; hex 800007d8; asc ;; 1: len 6; hex 0000000013b4; asc ;; 2: len 7; hex 000000010b1896; asc ;; 3: len 30; hex 38336332383232322d373930382d313164642d613061312d303030633239; asc 83c28222-7908-11dd-a0a1-000c29;...(truncated); 4: SQL NULL; 5: SQL NULL; 6: SQL NULL; ------------------ ---TRANSACTION 0 5070, ACTIVE 297 sec, process no 15193, OS thread id 2818026384 4 lock struct(s), heap size 320, 2 row lock(s), undo log entries 3 MySQL thread id 55, query id 10709 localhost root Waiting for table INSERT INTO view2 (t1_uuid) SELECT (SELECT t1_uuid FROM view1 WHERE t1_uuid = @uuid1) ---TRANSACTION 0 5044, ACTIVE 502 sec, process no 15193, OS thread id 2803403664 8 lock struct(s), heap size 1024, 6 row lock(s), undo log entries 4 MySQL thread id 35, query id 10752 localhost root Waiting for table SELECT view1.t1_uuid, view2.t2_date FROM view1 LEFT JOIN view2 USING (t1_uuid) WHERE t1_blob = @t1_uuid Trx read view will not see trx with id >= 0 5045, sees < 0 5045 -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 0 OS file reads, 2324 OS file writes, 2243 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 inserts, 0 merged recs, 0 merges Hash table size 34679, used cells 21, node heap has 1 buffer(s) 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 0 9502254 Log flushed up to 0 9502254 Last checkpoint at 0 9502254 0 pending log writes, 0 pending chkp writes 2221 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 18772970; in additional pool allocated 952320 Dictionary memory allocated 44960 Buffer pool size 512 Free buffers 197 Database pages 314 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages read 0, created 314, written 552 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 2 read views open inside InnoDB Main thread process no. 15193, id 2839817104, state: waiting for server activity Number of rows inserted 46074, updated 8, deleted 2017, read 74132 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 1 row in set (0.00 sec)
[2 Sep 2008 16:32]
Philip Stoev
Current locks: lock: 0x9da8a60: lock: 0xa72f2f20: lock: 0x9d1f438: lock: 0x9d46c44: lock: 0xb7204c70: lock: 0xa712b638: lock: 0xa72f6968: lock: 0xa72f3500: lock: 0x9d2f1e8: lock: 0xb71fa2c0: lock: 0x9d1035c: lock: 0x9cf8a2c: lock: 0x9ce6e34: lock: 0x9cdc994: lock: 0x9cd193c: lock: 0x9cc85b4: lock: 0x9cc05b4: lock: 0x9cb0a44: lock: 0x9cab02c: lock: 0x9ca3994: DB Table Version Thread Open Lock mysql db 1 0 1 Not in use interact_xml inter1 1 0 1 Not in use mysql tables_priv 1 0 1 Not in use test t2_eng2 1 0 1 Not in use mysql procs_priv 1 0 1 Not in use test inter2_log 1 0 1 Not in use test inter2_log 1 0 1 Not in use test t1_eng1 1 0 1 Not in use test inter2 1 0 1 Not in use test inter2 1 0 1 Not in use test inter2 1 0 1 Not in use mysql host 1 0 1 Not in use test space1 1 0 1 Not in use mysql proc 1 0 1 Not in use mysql user 1 0 1 Not in use mysql columns_priv 1 0 1 Not in use mysql event 1 0 1 Not in use interact_xml inter2 1 0 1 Not in use test inter1 0 0 1 Not in use test inter1 0 0 1 Not in use test inter1 0 0 1 Not in use test inter1_log 1 0 1 Not in use test inter1_log 1 0 1 Not in use test inter1_log 1 0 1 Not in use test inter1_log 1 0 1 Not in use mysql servers 1 0 1 Not in use mysql general_log 1 0 1 Not in use mysql general_log 1 0 1 Not in use mysql general_log 1 0 1 Not in use mysql slow_log 1 0 1 Not in use
[2 Sep 2008 16:42]
Philip Stoev
A variation of this was just observed -- all threads are locked, and one loops in the code below. SHOW PROCESSLIST shows all threads alternating between "Waiting for table" and "Opening tables": 3669 free_root(&new_frm_mem, MYF(MY_KEEP_PREALLOC)); (gdb) 3671 if (error) (gdb) 3673 if (action) (gdb) 3689 if (query_tables_last_own) (gdb) 3691 close_tables_for_reopen(thd, start, (action == OT_BACK_OFF_AND_RETRY)); (gdb) 3697 if (recover_from_failed_open_table_attempt(thd, tables, action)) (gdb) 3544 *counter= 0; (gdb) 3546 thd_proc_info(thd, "Opening tables"); (gdb) 3557 if (thd->handler_tables) (gdb) 3566 if (thd->locked_tables_mode <= LTM_LOCK_TABLES && (gdb) 3597 for (tables= *start; tables ;tables= tables->next_global) (gdb) 3599 DBUG_PRINT("tcache", ("opening table: '%s'.'%s' item: %p", (gdb) 3610 if (tables->derived) (gdb) 3631 if (tables->schema_table) (gdb) 3647 (*counter)++; (gdb) 3650 DBUG_ASSERT(!tables->table); (gdb) 3652 if (tables->prelocking_placeholder) (gdb) 3667 error= open_table(thd, tables, &new_frm_mem, &action, flags); (gdb) 3669 free_root(&new_frm_mem, MYF(MY_KEEP_PREALLOC));
[8 May 2009 9:28]
Olav Sandstå
I sometimes see an almost similar deadlock situation when running the RQG test named falcon_recovery. The test reports deadlock and when it kills the running mysqld process none of the threads are running in Falcon code but several are blocked in server code. About five or six threads are in "recover_from_failed_open_table_attempt()" when this happens with the following call stack: # 16:06:43 #0 0x00b99402 in __kernel_vsyscall () # 16:06:43 #1 0x0089e256 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 # 16:06:43 #2 0x0883b647 in safe_cond_wait (cond=0x8bdf380, mp=0x8bdec40, # 16:06:43 file=0x89a1076 "sql_base.cc", line=7785) at thr_mutex.c:423 # 16:06:43 #3 0x08325ec6 in tdc_wait_for_old_versions (thd=0xb71a078, # 16:06:43 mdl_context=0xb71a0ec) at sql_base.cc:7785 # 16:06:43 #4 0x08329b5b in recover_from_failed_open_table_attempt (thd=0xb71a078, # 16:06:43 table=0xb757e80, action=OT_BACK_OFF_AND_RETRY) at sql_base.cc:3513 # 16:06:43 #5 0x0832bb36 in open_tables (thd=0xb71a078, start=0xa708eda4, # 16:06:43 counter=0xa708ed90, flags=0) at sql_base.cc:3802 # 16:06:43 #6 0x0832c195 in open_and_lock_tables_derived (thd=0xb71a078, # 16:06:43 tables=0xb757e80, derived=true, flags=0) at sql_base.cc:4224 # 16:06:43 #7 0x082df512 in open_and_lock_tables (thd=0xb71a078, tables=0xb757e80) # 16:06:43 at ../../sql/mysql_priv.h:1597 # 16:06:43 #8 0x08379b42 in mysql_insert (thd=0xb71a078, table_list=0xb757e80, # 16:06:43 fields=@0xb71b684, values_list=@0xb71b6a8, update_fields=@0xb71b69c, # 16:06:43 update_values=@0xb71b690, duplic=DUP_ERROR, ignore=false) # 16:06:43 at sql_insert.cc:622 # 16:06:43 #9 0x082d60ec in mysql_execute_command (thd=0xb71a078) at sql_parse.cc:3244 # 16:06:43 #10 0x082dbdbb in mysql_parse (thd=0xb71a078, # 16:06:43 inBuf=0xb757cf0 "INSERT INTO `table10_falcon_int_autoinc` (`int`) VALUES ( 6 )", length=61, found_semicolon=0xa708fe80) at sql_parse.cc:5964 # 16:06:43 #11 0x082dcf3d in dispatch_command (command=COM_QUERY, thd=0xb71a078, # 16:06:43 packet=0xb74be31 "INSERT INTO `table10_falcon_int_autoinc` (`int`) VALUES ( 6 )", packet_length=61) at sql_parse.cc:1049 # 16:06:43 #12 0x082de1ce in do_command (thd=0xb71a078) at sql_parse.cc:731 # 16:06:43 #13 0x082cab73 in handle_one_connection (arg=0xb71a078) at sql_connect.cc:1146 # 16:06:43 #14 0x0089a45b in start_thread () from /lib/libpthread.so.0 # 16:06:43 #15 0x007f1c4e in clone () from /lib/libc.so.6 There are also threads hanging with the following call stacks: # 16:06:43 #0 0x00b99402 in __kernel_vsyscall () # 16:06:43 #1 0x0089e4dc in pthread_cond_timedwait@@GLIBC_2.3.2 () # 16:06:43 from /lib/libpthread.so.0 # 16:06:43 #2 0x0883b884 in safe_cond_timedwait (cond=0x8beb720, mp=0x8beb6a0, # 16:06:43 abstime=0xa6f0651c, file=0x89e1ae0 "mdl.cc", line=1127) at thr_mutex.c:477 # 16:06:43 #3 0x084d7b13 in MDL_ticket::upgrade_shared_lock_to_exclusive ( # 16:06:43 this=0xa6357d40) at mdl.cc:1127 # 16:06:43 #4 0x08327d9a in wait_while_table_is_used (thd=0xb7e64c8, table=0xa633e770, # 16:06:43 function=HA_EXTRA_PREPARE_FOR_RENAME) at sql_base.cc:2088 # 16:06:43 #5 0x0842c49d in mysql_fast_or_online_alter_table (thd=0xb7e64c8, # 16:06:43 table_list=0xb816a80, altered_table=0xa63ad958, create_info=0xa6f07204, # 16:06:43 alter_info=0xa6f06f3c, ha_alter_flags=0xa6f06f20, keys_onoff=LEAVE_AS_IS) # 16:06:43 at sql_table.cc:6028 # 16:06:43 #6 0x0842f617 in mysql_alter_table (thd=0xb7e64c8, new_db=0xb816c88 "test", # 16:06:43 new_name=0xb8169d8 "table10_falcon_int_autoinc", create_info=0xa6f07204, # 16:06:43 table_list=0xb816a80, alter_info=0xa6f076fc, order_num=0, order=0x0, # 16:06:43 ignore=false) at sql_table.cc:7179 # 16:06:43 #7 0x082d4f7c in mysql_execute_command (thd=0xb7e64c8) at sql_parse.cc:2899 # 16:06:43 #8 0x082dbdbb in mysql_parse (thd=0xb7e64c8, # 16:06:43 inBuf=0xb8168e8 "ALTER TABLE `table10_falcon_int_autoinc` ADD INDEX l ( `char_255` )", length=67, found_semicolon=0xa6f07e80) at sql_parse.cc:5964 # 16:06:43 #9 0x082dcf3d in dispatch_command (command=COM_QUERY, thd=0xb7e64c8, # 16:06:43 packet=0xb80aa29 "ALTER TABLE `table10_falcon_int_autoinc` ADD INDEX l ( `char_255` )", packet_length=67) at sql_parse.cc:1049 # 16:06:43 #10 0x082de1ce in do_command (thd=0xb7e64c8) at sql_parse.cc:731 # 16:06:43 #11 0x082cab73 in handle_one_connection (arg=0xb7e64c8) at sql_connect.cc:1146 # 16:06:43 #12 0x0089a45b in start_thread () from /lib/libpthread.so.0 # 16:06:43 #13 0x007f1c4e in clone () from /lib/libc.so.6 # 16:06:43 #0 0x00b99402 in __kernel_vsyscall () # 16:06:43 #1 0x0089e4dc in pthread_cond_timedwait@@GLIBC_2.3.2 () # 16:06:43 from /lib/libpthread.so.0 # 16:06:43 #2 0x0883b884 in safe_cond_timedwait (cond=0x8beb720, mp=0x8beb6a0, # 16:06:43 abstime=0xa6f0651c, file=0x89e1ae0 "mdl.cc", line=1127) at thr_mutex.c:477 # 16:06:43 #3 0x084d7b13 in MDL_ticket::upgrade_shared_lock_to_exclusive ( # 16:06:43 this=0xa6357d40) at mdl.cc:1127 # 16:06:43 #4 0x08327d9a in wait_while_table_is_used (thd=0xb7e64c8, table=0xa633e770, # 16:06:43 function=HA_EXTRA_PREPARE_FOR_RENAME) at sql_base.cc:2088 # 16:06:43 #5 0x0842c49d in mysql_fast_or_online_alter_table (thd=0xb7e64c8, # 16:06:43 table_list=0xb816a80, altered_table=0xa63ad958, create_info=0xa6f07204, # 16:06:43 alter_info=0xa6f06f3c, ha_alter_flags=0xa6f06f20, keys_onoff=LEAVE_AS_IS) # 16:06:43 at sql_table.cc:6028 # 16:06:43 #6 0x0842f617 in mysql_alter_table (thd=0xb7e64c8, new_db=0xb816c88 "test", # 16:06:43 new_name=0xb8169d8 "table10_falcon_int_autoinc", create_info=0xa6f07204, # 16:06:43 table_list=0xb816a80, alter_info=0xa6f076fc, order_num=0, order=0x0, # 16:06:43 ignore=false) at sql_table.cc:7179 # 16:06:43 #7 0x082d4f7c in mysql_execute_command (thd=0xb7e64c8) at sql_parse.cc:2899 # 16:06:43 #8 0x082dbdbb in mysql_parse (thd=0xb7e64c8, # 16:06:43 inBuf=0xb8168e8 "ALTER TABLE `table10_falcon_int_autoinc` ADD INDEX l ( `char_255` )", length=67, found_semicolon=0xa6f07e80) at sql_parse.cc:5964 # 16:06:43 #9 0x082dcf3d in dispatch_command (command=COM_QUERY, thd=0xb7e64c8, # 16:06:43 packet=0xb80aa29 "ALTER TABLE `table10_falcon_int_autoinc` ADD INDEX l ( `char_255` )", packet_length=67) at sql_parse.cc:1049 # 16:06:43 #10 0x082de1ce in do_command (thd=0xb7e64c8) at sql_parse.cc:731 # 16:06:43 #11 0x082cab73 in handle_one_connection (arg=0xb7e64c8) at sql_connect.cc:1146 # 16:06:43 #12 0x0089a45b in start_thread () from /lib/libpthread.so.0 # 16:06:43 #13 0x007f1c4e in clone () from /lib/libc.so.6
[17 Jun 2009 20:47]
Konstantin Osipov
3726 is part of 5.4
[30 Jun 2009 10:58]
Jon Olav Hauglid
For the call stacks posted by Olav, the problem seems to be he test and not the locking code. In the RQG grammar used for the falcon-recovery test (falcon_recovery.yy), a long-running transaction is defined (stall_serial_log_rotation). This transaction accesses some data and then sleeps for 1800 seconds. While sleeping, the transaction will hold a shared metadata lock that will prevent ALTER TABLE statements on the same table from upgrading to exclusive metadata lock. This is the cause for the threads stalling in MDL_ticket::upgrade_shared_lock_to_exclusive(). Threads stalling in tdc_wait_for_old_versions() is a consequence of ALTER TABLE being unable to complete. The RQG reports deadlock when 3 queries have taken more than 5 minutes to run. Changing the sleep to 10 seconds let the test pass without problems.
[30 Jun 2009 11:41]
Jon Olav Hauglid
Closing this bug as Won't fix based on a discussion with Philip. He is unable to reproduce the reported problem consistently and will post a new bug report if it occurs again. Olav's problem has been resolved as described above. Note for future reference: Olav's report by mistake includes the same call stack twice. There should only be once call stack for ALTER TABLE.
[2 Jul 2009 15:40]
Konstantin Osipov
Correct status for is "Can't repeat" then.