Bug #19727 InnoDB crashed server and crashed tables are not recoverable
Submitted: 11 May 2006 14:17 Modified: 18 Jun 2010 12:43
Reporter: Gabriel Barazer Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:4.1, 5.0, 5.1 OS:Linux (Linux 2.6)
Assigned to: CPU Architecture:Any

[11 May 2006 14:17] Gabriel Barazer
Description:
Here is the Error log :
060510 19:58:01  InnoDB: Error: a record lock wait happens in a dictionary operation!
InnoDB: Table name `SYS_TABLES`.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com
060510 19:58:01InnoDB: Assertion failure in thread 2691017648 in file trx0trx.c line 336
InnoDB: Failing assertion: trx->wait_lock == NULL
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html
InnoDB: about forcing recovery.
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=67108864
read_buffer_size=2093056
max_used_connections=364
max_connections=500
threads_connected=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 694316 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0xa2f3f7e0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xa0657f5c, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x8171f24
0x7598a8
(nil)
0x821a5eb
0x821076a
0x8222434
0x8186865
0x818d63e
0x8184c82
0x81847ad
0x8183cb5
0x7531d5
0x66d2da
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0xa2b1f998  is invalid pointer
thd->thread_id=17890351
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
InnoDB: Thread 2730863536 stopped in file ../include/sync0sync.ic line 111
InnoDB: Thread 2660981680 stopped in file ha_innodb.cc line 973

After the server is restarted by safe_mysqld :
060511  3:09:27060511  3:09:27 [ERROR] Cannot find table live/live_log_raw_wip_446229a9107a6_tmp from the internal data dictionary

when trying to access the table.

How to repeat:
To repeat the table access error, juste try to SELECT or anything else on the corrupted table.
To repeat the crash, no idea since this happened without any reason
[11 May 2006 14:27] Heikki Tuuri
Gabriel,

please attach the complete .err log, maybe gzipped, to this bug report.

Regards,

Heikki
[11 May 2006 14:40] Heikki Tuuri
There is a bug in 5 functions in row0mysql.c. We should commit the trx BEFORE calling unlock_data_dictionary(), since we assume in InnoDB's internal SQL interpreter that transactional lock waits can never happen. The whole data dict operation just be protected then by the dictionary semaphore.

Assigning Osku to fix this in 4.1, 5.0, 5.1.

row0mysql.c:

/*************************************************************************
Discards the tablespace of a table which stored in an .ibd file. Discarding
means that this function deletes the .ibd file and assigns a new table id for
the table. Also the flag table->ibd_file_missing is set TRUE. */

int
row_discard_tablespace_for_mysql(
/*=============================*/
                                /* out: error code or DB_SUCCESS */
        const char*     name,   /* in: table name */
        trx_t*          trx)    /* in: transaction handle */
{
...
funct_exit:
        row_mysql_unlock_data_dictionary(trx);

        if (graph) {
                que_graph_free(graph);
        }

        trx_commit_for_mysql(trx);

        trx->op_info = "";

        return((int) err);
}

/*********************************************************************
Imports a tablespace. The space id in the .ibd file must match the space id
of the table in the data dictionary. */

int
row_import_tablespace_for_mysql(
/*============================*/
                                /* out: error code or DB_SUCCESS */
        const char*     name,   /* in: table name */
        trx_t*          trx)    /* in: transaction handle */
{

...

funct_exit:
        row_mysql_unlock_data_dictionary(trx);

        trx_commit_for_mysql(trx);

        trx->op_info = "";

        return((int) err);
}

/*************************************************************************
Drops a table for MySQL. If the name of the table to be dropped is equal
with one of the predefined magic table names, then this also stops printing
the corresponding monitor output by the master thread. */

int
row_drop_table_for_mysql(
/*=====================*/
                                /* out: error code or DB_SUCCESS */
        const char*     name,   /* in: table name */
        trx_t*          trx,    /* in: transaction handle */
        ibool           drop_db)/* in: TRUE=dropping whole database */
{
...
funct_exit:

        if (locked_dictionary) {
                row_mysql_unlock_data_dictionary(trx);
        }

        if (dir_path_of_temp_table) {
                mem_free(dir_path_of_temp_table);
        }

        que_graph_free(graph);

        trx_commit_for_mysql(trx);

        trx->op_info = "";

#ifndef UNIV_HOTBACKUP
        srv_wake_master_thread();
#endif /* !UNIV_HOTBACKUP */

        return((int) err);
}

/*************************************************************************
Drops a database for MySQL. */

int
row_drop_database_for_mysql(
/*========================*/
                                /* out: error code or DB_SUCCESS */
        const char*     name,   /* in: database name which ends to '/' */
        trx_t*          trx)    /* in: transaction handle */
{
...
        row_mysql_unlock_data_dictionary(trx);

        trx_commit_for_mysql(trx);

        trx->op_info = "";

        return(err);
}

/*************************************************************************
Renames a table for MySQL. */

int
row_rename_table_for_mysql(
/*=======================*/
                                        /* out: error code or DB_SUCCESS */
        const char*     old_name,       /* in: old table name */
        const char*     new_name,       /* in: new table name */
        trx_t*          trx)            /* in: transaction handle */
{
...

funct_exit:
        if (!recovering_temp_table) {
                row_mysql_unlock_data_dictionary(trx);
        }

        if (graph) {
                que_graph_free(graph);
        }

        if (UNIV_LIKELY_NULL(heap)) {
                mem_heap_free(heap);
        }

        trx_commit_for_mysql(trx);

        trx->op_info = "";

        return((int) err);
}
[11 May 2006 14:43] Gabriel Barazer
.err Log sent.
This log begins at the time I upgraded my 4.1 server to 5.0.21 (cleared the old 4.1 log (.err before starting the 5.0.21 server)
As I  said in the previous comment, I run a cron job which "rotates" the tables every minute (CREATE LIKE, double RENAME)
When repairing the server (I remarked the bug only today), I had to remove some .frm manually, and create a table in another database , then copy the .frm then DROP TABLE (on one hand data wasn't in the internal log but had .frm, on the other hand , data was in the internal log but without .frm)
I think the out of sync is done if the server crashed during rename. There must be a problem with anything done on the .frm _before_ the innodb internal is updated/commited.
[11 May 2006 14:43] Heikki Tuuri
Gabriel,

since mysqld probably crashed in a RENAME, the .frm files and the tables inside InnoDB's internal data dictionary are not in sync. To access the table you need to fix this sync problem. Please see
http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting-datadict.html

Regards,

Heikki
[11 May 2006 14:46] Gabriel Barazer
Sorry for the previous comment, you posted faster than me :)

I think this bug can be closed now.

Thanks !
[11 May 2006 14:48] Heikki Tuuri
Gabriel,

we cannot close the bug since Osku really has to fix the 5 bugs in row0mysql.c.

The lock wait and crash is improbable. That is why no one has reported this timing problem before.

Thank you!

Heikki
[12 May 2006 12:14] Osku Salerma
Fixed in 4.1, 5.0 and 5.1 in InnoDB's Subversion repositories.
[31 May 2006 20:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7116
[1 Jun 2006 10:16] Alexander Ivanov
Pushed to 5.1.11
[6 Jun 2006 8:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7312
[6 Jun 2006 9:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7314
[6 Jun 2006 9:49] Alexander Ivanov
Pushed to 4.1.20, 5.0.22.
[20 Jun 2006 2:29] Lars Thalmann
Pushed into 4.1.21
[21 Jun 2006 14:21] Lars Thalmann
Pushed into 5.1.12
[22 Jun 2006 15:48] Paul DuBois
Noted in 4.1.21, 5.0.23, 5.1.12.

InnoDB unlocked its data directory before committing a transaction,
potentially resulting in non-recoverable tables if a server crash
occurred before the commit.
[5 May 2010 15:03] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 2:08] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 6:04] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:33] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 7:01] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 2:47] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[17 Jun 2010 12:09] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:56] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:36] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)