Bug #37963 | all threads wait in 'opening tables' or 'closing tables' status on mutex/lock | ||
---|---|---|---|
Submitted: | 8 Jul 2008 11:06 | Modified: | 13 Nov 2009 17:36 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S2 (Serious) |
Version: | 5.x | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[8 Jul 2008 11:06]
Shane Bester
[22 Jul 2008 7:42]
MySQL Verification Team
This specific occurrence is easily reproducible by adding a sleep() into trx_purge(void) function in ./trx/trx0purge.c here: rw_lock_x_lock(&(purge_sys->latch)); mutex_enter(&kernel_mutex); printf("stopping in trx_purge() for 120 seconds\n"); sleep(120); printf("about to continue..\n"); /* Close and free the old purge view */ now we have access to a myisam blocked due to innodb taking too long: Command | Time | State | Info ---------+------+----------------+----------------------------------------- Query | 108 | Opening tables | select * from mysql.user limit 1
[19 Sep 2008 9:17]
MySQL Verification Team
my analysis of hang in csc29327 In one connection, you have the alter table running: ---TRANSACTION 5 830089219, not started, process no 10333, OS thread id 1165752672 discarding tablespace mysql tables in use 1, locked 1 MySQL thread id 27849451, query id 45934707873 localhost root ALTER TABLE posiciones_tbl DISCARD TABLESPACE Notice that all other connections are stuck in "Opening tables" state. Even queries that access another completely different table! They are hung in mysql code waiting for the global LOCK_open mutex held by the hung 'alter table'. Then we have semaphore waits within innodb, so lets see why it got stuck. ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 1476801360, signal count 1141393786 --Thread 1157642592 has waited at dict0dict.c line 751 for 950.00 seconds the semaphore: Mutex at 0x2aaab8b798b8 created file dict0dict.c line 709, lock var 1 waiters flag 1 --Thread 1140881760 has waited at row0mysql.c line 1720 for 946.00 seconds the semaphore: S-lock on RW-latch at 0xe0d620 created in file dict0dict.c line 725 a writer (thread id 1165752672) has reserved it in mode exclusive number of readers 0, waiters flag 1 Last time read locked in file row0mysql.c line 1720 Last time write locked in file row0mysql.c line 1755 --Thread 1168320864 has waited at dict0dict.c line 781 for 950.00 seconds the semaphore: Mutex at 0x2aaab8b798b8 created file dict0dict.c line 709, lock var 1 waiters flag 1 Mutex spin waits 0, rounds 79151124235, OS waits 942314335 RW-shared spins 627358234, OS waits 229992927; RW-excl spins 2148711871, OS waits 224643742 Thread 1140881760 is the main innodb thread. It is trying purge old versions of records in the background. Thread 1165752672 is the alter table thread. The purging cannot continue after 600 seconds because purge thread called row_mysql_freeze_data_dictionary() after the alter table is holding the mutex locked in row_mysql_lock_data_dictionary()! That mutex is called dict_operation_lock. This is why the crash happened. The question is why did 'alter table' take so long and not release the mutex? When the crash happened the 'alter table' was here: InnoDB: Thread 1165752672 stopped in file buf0lru.c line 92 void buf_LRU_invalidate_tablespace( /*==========================*/ ulint id) /* in: space id */ { buf_block_t* block; ulint page_no; ibool all_freed; scan_again: mutex_enter(&(buf_pool->mutex)); mutex_enter(&block->mutex); <------ here, waiting Unfortunately, we cannot see exactly which thread is holding the block->mutex mutex locked. I think something like this happened: Let Thread A=innodb purge, Thread B=alter table: Thread A: locks block->mutex Thread B: locks LOCK_open Thread B: locks dict_operation_lock Thread A: tries to lock dict_operation_lock but cannot, so waits Thread B: tries to lock block->mutex but cannot, so waits.
[20 Sep 2008 5:09]
MySQL Verification Team
note: we should check adaptive hash cells and free buffers in all cases in case they are changing.
[8 Oct 2008 18:41]
Inaam Rana
Shane, I have opened bug#39939 to track the buf_LRU_invalidate_tablespace() issue. regards, inaam
[8 May 2009 20:42]
MySQL Verification Team
see also bug #44757
[9 May 2009 0:08]
bhushan uparkar
In http://bugs.mysql.com/bug.php?id=44757, the mysql server was not crashed, but was unresponsive till the table operation was completed.
[13 Nov 2009 17:36]
MySQL Verification Team
this is a duplicate of bug #41158