MySQL Bugs: #37963: all threads wait in 'opening tables' or 'closing tables' status on mutex/lock

Bug #37963	all threads wait in 'opening tables' or 'closing tables' status on mutex/lock
Submitted:	8 Jul 2008 11:06	Modified:	13 Nov 2009 17:36
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S2 (Serious)
Version:	5.x	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This could be similar to bug #25044 or bug #37346 or else a new feature request.

Running a 'drop table' might take a while, depending on storage engine specifications.  In this specific case 'drop table' had been waiting for innodb to process it.  During this time, innodb status showed:

InnoDB: Warning: a long semaphore wait:
--Thread 1192802624 has waited at row/row0mysql.c line 1711 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0xc5de80 created in file dict/dict0dict.c line 737
a writer (thread id 1192802624) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row/row0mysql.c line 1676
Last time write locked in file row/row0mysql.c line 1711

Purge thread happened to be running at the same time.

The problem:
All other threads (even ones selecting from myisam tables in different database) had to wait for the drop table to complete before they got out of "opening tables" state.

How to repeat:
view the code/innodb status/show processlist/locking design

Suggested fix:
Is there some way innodb or other storage engines can not acquire a global lock for too long?  Is it needed to lock mutex LOCK_open for too long?

This specific occurrence is easily reproducible by adding a sleep() into trx_purge(void) function in ./trx/trx0purge.c here:

rw_lock_x_lock(&(purge_sys->latch));

mutex_enter(&kernel_mutex);
printf("stopping in trx_purge() for 120 seconds\n");
sleep(120);
printf("about to continue..\n");
/* Close and free the old purge view */

now we have access to a myisam blocked due to innodb taking too long:

 Command | Time | State          | Info
---------+------+----------------+-----------------------------------------
 Query   |  108 | Opening tables | select * from mysql.user limit 1

my analysis of hang in csc29327

In one connection, you have the alter table running:

---TRANSACTION 5 830089219, not started, process no 10333, OS thread id 1165752672 discarding tablespace
mysql tables in use 1, locked 1
MySQL thread id 27849451, query id 45934707873 localhost root
ALTER TABLE posiciones_tbl DISCARD TABLESPACE

Notice that all other connections are stuck in "Opening tables" state.
Even queries that access another completely different table! They are hung in
mysql code waiting for the global LOCK_open mutex held by the hung 'alter table'.

Then we have semaphore waits within innodb, so lets see why it got stuck.

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1476801360, signal count 1141393786
--Thread 1157642592 has waited at dict0dict.c line 751 for 950.00 seconds the semaphore:
Mutex at 0x2aaab8b798b8 created file dict0dict.c line 709, lock var 1
waiters flag 1
--Thread 1140881760 has waited at row0mysql.c line 1720 for 946.00 seconds the semaphore:
S-lock on RW-latch at 0xe0d620 created in file dict0dict.c line 725
a writer (thread id 1165752672) has reserved it in mode exclusive
number of readers 0, waiters flag 1
Last time read locked in file row0mysql.c line 1720
Last time write locked in file row0mysql.c line 1755
--Thread 1168320864 has waited at dict0dict.c line 781 for 950.00 seconds the semaphore:
Mutex at 0x2aaab8b798b8 created file dict0dict.c line 709, lock var 1
waiters flag 1
Mutex spin waits 0, rounds 79151124235, OS waits 942314335
RW-shared spins 627358234, OS waits 229992927; RW-excl spins 2148711871, OS waits 224643742

Thread 1140881760 is the main innodb thread. It is trying purge old versions of records in the background.
Thread 1165752672 is the alter table thread.

The purging cannot continue after 600 seconds because purge thread called
row_mysql_freeze_data_dictionary() after the alter table is holding the mutex locked
in row_mysql_lock_data_dictionary()! That mutex is called dict_operation_lock.

This is why the crash happened. The question is why did 'alter table' take
so long and not release the mutex? When the crash happened the 'alter table' was here:

InnoDB: Thread 1165752672 stopped in file buf0lru.c line 92

void
buf_LRU_invalidate_tablespace(
/*==========================*/
ulint id) /* in: space id */
{
buf_block_t* block;
ulint page_no;
ibool all_freed;

scan_again:
mutex_enter(&(buf_pool->mutex));
mutex_enter(&block->mutex); <------ here, waiting

Unfortunately, we cannot see exactly which thread is holding the block->mutex mutex
locked. I think something like this happened:

Let Thread A=innodb purge, Thread B=alter table:

Thread A: locks block->mutex
Thread B: locks LOCK_open
Thread B: locks dict_operation_lock
Thread A: tries to lock dict_operation_lock but cannot, so waits
Thread B: tries to lock block->mutex but cannot, so waits.

note: we should check adaptive hash cells and free buffers in all cases in case they are changing.

Shane,

I have opened bug#39939 to track the buf_LRU_invalidate_tablespace() issue.

regards,
inaam

see also bug #44757

In http://bugs.mysql.com/bug.php?id=44757, the mysql server was not crashed, but was unresponsive till the table operation was completed.

this is a duplicate of bug #41158