Bug #54742 Crash while dropping a table, unable to recover
Submitted: 23 Jun 2010 15:06 Modified: 30 Dec 2012 10:15
Reporter: Tomas Are Haavet Email Updates:
Status: Can't repeat Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.5.4-m3 OS:Linux
Assigned to: Sveta Smirnova CPU Architecture:Any

[23 Jun 2010 15:06] Tomas Are Haavet
When dropping an 1.4TB innodb table in a heavily used database, the mysql server froze (as reported in bug http://bugs.mysql.com/bug.php?id=44757), but queries where queued and disk space continually freed so everything looked OK. But in the end the mysql server figured out a semaphore had waited too long (600sec) and intentionally crashed the server.

During roll back, we got the following error message for the dropped table 'agent/LinkObs_agent1':

100622 17:27:23  InnoDB: Rolling back trx with id 189287440B, 24 rows to undo
InnoDB: Dropping table with id 0 37 in recovery if it exists
100622 17:27:27  InnoDB: error: space object of table'agent/LinkObs_agent1',
InnoDB: space id 25 did not exist in memory. Retrying an open.
100622 17:27:27  InnoDB: Assertion failure in thread 47807554208944 in file fil/fil0fil.c line 3001
InnoDB: Failing assertion: flags != DICT_TF_COMPACT
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
100622 17:27:27 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 44062624 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
100622 17:27:29 mysqld_safe mysqld from pid file /var/lib/mysql/thales.opoint.com.pid ended

How to repeat:
Drop a huge innodb table on a heavily used mysql server. Try to recover when it crashes.
[23 Jun 2010 15:12] Tomas Are Haavet
Snippet from mysql server error log file

Attachment: err.log (text/x-log), 6.66 KiB.

[23 Jun 2010 15:13] Tomas Are Haavet
my.cnf file

Attachment: my.cnf (application/octet-stream, text), 2.26 KiB.

[24 Jun 2010 11:35] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior using artificial test: I get error "table not exists", but server is running and did not crash again. Please enable core files and next time when server crashes while dropping large table upload core file and mysqld to our FTP server. I want to check where DROP TABLE stopped working.
[28 Jun 2010 8:08] Tomas Are Haavet
Thanks for your quick response and trying to reproduce. So your server did not crash while dropping the table? Did it take more than 600sec? I'm pretty sure my query were running for 15-20min before the server crashed itself.

We can enable core files and report back if it happens again, but I highly doubt we will be in this situation again, as it is not a every day situation and could put us out of business if we can not recover... :) Would it be better to truncate the table first?

Anyway, to get the server up and running again and without much in-depth knowledge of the mysql internals, we figured we had to skip recovery of the transaction against the dropped table, but we couldn't find any information about this online. So we ended up temporary commenting out lines 503-516 in storage/innobase/trx/trx0roll.c (the call to dict_table_get_on_id_low()) to skip it. Could this have been done another way?
[28 Jun 2010 8:50] Sveta Smirnova
Thank you for the feedback.

I modified source code to let DROP TABLE run for 600 seconds. Problem is crash was when some event occurred during drop table. This event is hard to guess, but core file should show where DROP TABLE stopped. Also is not clear if semaphore wait was due to long-running DROP command, or occurred during other operations.
[28 Jul 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[8 Sep 2010 16:00] Mark Callaghan
This might be http://bugs.mysql.com/bug.php?id=54009 and is similar to http://bugs.mysql.com/bug.php?id=56373. The fix might be to mask DICT_TF_COMPACT when fil_open_single_table_tablespace is called.
[30 Dec 2012 10:15] MySQL Verification Team
Hi Thomas,

Please use 5.6.9 and check if problems persist.  5.5.4 is hardly worthy of production load.  See also bug #54009 and bug #56373