MySQL Bugs: #55722: Drop a large innodb table hung the server and lead to server crash

Bug #55722	Drop a large innodb table hung the server and lead to server crash
Submitted:	3 Aug 2010 21:45	Modified:	9 Aug 2010 21:42
Reporter:	Peiran Song	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.1.42, 5.1.45	OS:	Solaris (also Linux Debian)
Assigned to:		CPU Architecture:	Any

Description:
We experienced very slow 'DROP TABLE' operations where all other queries were hang at 'Opening tables' state. In one case, the semaphore wait was too long and the server crashed.

Here is the details:

== Case 1: 5.1.42, without innodb plugin open solaris, zfs

The table had 70 million rows, .ibd file was 6.4G, not in use. The server was in production taking writes.

The 'drop table' operation took over 600 seconds, and blocked main thread in 'flushing buffer pool' state, and all other queries in 'Opening tables'.

--Thread 9 has waited at buf/buf0flu.c line 607 for 253.00 seconds the semaphore:
S-lock on RW-latch at 9143aec10 created in file buf/buf0buf.c line 547
a writer (thread id 22746) has reserved it in mode exclusive

thread 9: main thread doing flushing buffer pool
thread 22746: drop table
buf/buf0flu.c line 607 : rw_lock_s_lock_gen(&(block->lock), BUF_IO_WRITE);
buf/buf0buf.c line 547 : rw_lock_create(&block->lock, SYNC_LEVEL_VARYING);

Did all other queries at 'Opening tables' state actually waiting for main thread to process them?

== Case 2 : 5.1.45 percona, xtradb, Linux 2.6.26, Debian, xfs

The table had ~30 million rows, .ibd file 5G. Dropping it on a server with only reads took a couple seconds each. But on a server with read+write (300~400 QPS), it took 300 seconds while blocking other queries. The semaphore appears different from Case 1:

--Thread 1258379600 has waited at row/row0purge.c line 541 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0xe03f60 '&dict_operation_lock'
a writer (thread id 1449539920) has reserved it in mode exclusive

thread 1258379600 : unknown
thread 1449539920 : drop table

I will attach "show innodb status" output in error log files for each case.

Thank you for investigation/fix.

How to repeat:
Drop a large table on a server with writes going on.

Suggested fix:

I saw another similar bug report but on a different semaphore. Hope all the extended global mutex during a drop table operation could all be fixed. :)

short version of the innodb status output

Attachment: case_1.err.short (application/octet-stream, text), 13.00 KiB.

short version of innodb status output for case 2

Attachment: case_2.err.short (application/octet-stream, text), 17.55 KiB.

Thank you for the report.

How much physical RAM do you have?

48G

In Case 1, innodb_buffer_pool_size = 35G
In Case 2, innodb_buffer_pool_size = 38G

Thanks for looking into it.

Thank you for the feedback.

Error log complained about 16G+ RAM buffer usage, with InnoDB pool you have you can reach RAM outage, but as crash happened due to long semaphor wait this is probably not the case.

Which table have you dropped? Could you please send us full error log for scenario 1 (our server)?

Thank you. I uploaded the full error log file from Case 1, bug-55722-case1_errorlog.gz, to the ftp site.

Thank you for the feedback.

This is really duplicate of bug #41158. Please subscribe to this bug to get updates when package with fix is available.

Thanks. In my case, the tables in question were not very very big, and were not using ext3. And the semaphore in two cases were different which seems suggesting two issues. 
Anyway, I updated the other bug. And glad to see the fix coming out!