MySQL Bugs: #25645: Assertion failure in file srv0srv.c; adaptive hash hang in a complex query

Bug #25645	Assertion failure in file srv0srv.c; adaptive hash hang in a complex query
Submitted:	16 Jan 2007 10:19	Modified:	18 Jun 2010 23:01
Reporter:	Eric ten Hoeve	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.0.26	OS:	Linux (linux Red Hat)
Assigned to:	Vasil Dimov	CPU Architecture:	Any
Tags:	assertion failure, innodb, Semaphore, srv0srv.c

Description:
Our server crashed with the following error found in mysql.err:

InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
070116 10:52:04InnoDB: Assertion failure in thread 1147169088 in file srv0srv.c line 2093
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html
InnoDB: about forcing recovery.

The database has applied it's redo log after startup and appears to be working as normal.

How to repeat:
Lots of transactions were active at the moment of the crash, so steps to reproduce would be very difficult

Thank you for a problem report. It can be a duplicate of bug #19081. Please, send the results of:

uname -a

(to check is it 64-bit Linux) and your my.cnf file content.

uname -a:
Linux dbhost 2.6.17-1.2139_FC5 #1 SMP Fri Jun 23 12:40:11 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

my.cnf:

[client]
port=3306
socket=/var/lib/mysql/mysql.sock

[mysqld]
#innodb_force_recovery = 4
skip-slave-start
slave-skip-errors=1396
skip-bdb
log-bin = /var/lib/mysql/mysql-bin.log
log-error = /var/lib/mysql/mysql.err
relay-log = /var/lib/mysql/mysql-relay.log
log-slow-queries = /var/lib/mysql/mysql-slow-query.log
pid-file = /var/lib/mysql/mysql.pid
#binlog-do-db = dd ddtemp
server-id = 1
port=3306
log-slow-queries
log-warnings
max_allowed_packet=32M
default-character-set=latin1
thread_concurrency=4
max_connections=2000
max_user_connections=300
max_connect_errors=10000
interactive_timeout=3600
wait_timeout=1800
connect_timeout=120
skip-external-locking
query-cache-type = 1
query-cache-size = 128M
default-character-set = latin1
default_storage_engine = InnoDB
read_buffer_size=8M
sort_buffer_size=16M
key_buffer_size=128M
table_cache=2000
tmp_table_size=67108864
datadir=/var/lib/mysql
innodb_data_file_path = ibdata1:100M:autoextend
innodb_buffer_pool_size = 4000M
innodb_additional_mem_pool_size=20M
innodb_log_file_size=500M
innodb_log_buffer_size=16M
innodb_flush_log_at_trx_commit=0
innodb_flush_method=O_DSYNC
innodb_file_per_table
innodb_open_files=2048
innodb_data_home_dir=/var/lib/mysql
innodb_log_group_home_dir=/var/lib/mysql
innodb_log_files_in_group=2
innodb_lock_wait_timeout=120

[mysqld_safe]
datadir=/var/lib/mysql
open_files_limit=16384

[mysqldump]
set-variable = max_allowed_packet=32M

[mysqr]

Hi!

Please post the entire .err log, gzipped if it is large.

This might be 'thread thrashing', which is fixed in 5.0.30.

Regards,

Heikki

Hello Heikki,

attached is the gzipped err log

Regards,

Eric

The deadlock comes in the call of this function:

dict0dict.c in 5.0.23:
/************************************************************************
Decrements the count of open MySQL handles to a table. */

void
dict_table_decrement_handle_count(
/*==============================*/
        dict_table_t*   table)  /* in: table */
{
        mutex_enter(&(dict_sys->mutex));

        ut_a(table->n_mysql_handles_opened > 0);

        table->n_mysql_handles_opened--;

        mutex_exit(&(dict_sys->mutex));
}

It is probably the call row_prebuilt_free() below that causes the call to dict_table_decrement_handle_count().

ha_innodb.cc:

/**********************************************************************
Closes a handle to an InnoDB table. */

int
ha_innobase::close(void)
/*====================*/
                                /* out: 0 */
{
        DBUG_ENTER("ha_innobase::close");

        row_prebuilt_free((row_prebuilt_t*) innobase_prebuilt);

        my_free((gptr) upd_buff, MYF(0));
        free_share(share);

        /* Tell InnoDB server that there might be work for
        utility threads: */

        srv_active_wake_master_thread();

        DBUG_RETURN(0);
}

Hmm... we could add to the start of the above function:

innobase_release_temporary_latches(current_thd);

The join query is on 4 tables. But the printout says that MySQL has 5 tables in use. Maybe MySQL opens some table several times, and at the query end closes the table handle before calling ::external_lock() on the rest of tables:

::external_lock() in ha_innodb.cc:
        /* MySQL is releasing a table lock */

        trx->n_mysql_tables_in_use--;
        prebuilt->mysql_has_locked = FALSE;

        /* If the MySQL lock count drops to zero we know that the current SQL
        statement has ended */

        if (trx->n_mysql_tables_in_use == 0) {

                trx->mysql_n_tables_locked = 0;
                prebuilt->used_in_HANDLER = FALSE;

                /* Release a possible FIFO ticket and search latch. Since we
                may reserve the kernel mutex, we have to release the search
                system latch first to obey the latching order. */

                innobase_release_stat_resources(trx);

                if (!(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))) {
                        if (trx->active_trans != 0) {
                                innobase_commit(thd, TRUE);
                        }
                } else {
                        if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
                                                && trx->global_read_view) {

                                /* At low transaction isolation levels we let
                                each consistent read set its own snapshot */

                                read_view_close_for_mysql(trx);
                        }
                }
        }

        if (trx->n_mysql_tables_in_use == 0) {

                trx->mysql_n_tables_locked = 0;
                prebuilt->used_in_HANDLER = FALSE;

                /* Release a possible FIFO ticket and search latch. Since we
                may reserve the kernel mutex, we have to release the search
                system latch first to obey the latching order. */

                innobase_release_stat_resources(trx);

                if (!(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))) {
                        if (trx->active_trans != 0) {
                                innobase_commit(thd, TRUE);
                        }
                } else {
                        if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
                                                && trx->global_read_view) {

                                /* At low transaction isolation levels we let
                                each consistent read set its own snapshot */

                                read_view_close_for_mysql(trx);
                        }
                }
        }

That could explain the hang. But it is not clear if this is the reason.

We could also move the call:

innobase_release_stat_resources(trx);

outside the 'if' in ::external_lock(). That would add more safety that whatever MySQL does at a query end, there would be no risk of a hang on the btr search latch.

Just for clarification, the status of this bug is "Patch Pending". Does that mean a root cause has been identified and fixed in a patch, or in code "pending" being made into a patch? 

Also, is this 19081, or something different?

We are seeing this on an amd64 system running 5.0.30 as well. If you need more information, logs, etc., we would be more than happy to provide it. This is biting us about twice a week now...

Carl,

the two suggestions I made above are the 'pending patch'. I need to test them, and another developer needs to approve them. I am very optimistic that either of the patches solves this bug.

If you compile mysqld yourself, you can test the above patches.

Bug #19081 is probably an entirely different bug.

Regards,

Heikki

Assigning this to Vasil.

[root@dbhost mysql]# ls -l /var/lib/mysql
total 20080672
-rw-r--r-- 1 root  root            0 Mar  1 16:39 binlog.txt
-rw-rw---- 1 mysql mysql         148 Apr 12 21:02 dbhost-relay-bin.000001
-rw-rw---- 1 mysql mysql          98 Apr 12 21:02 dbhost-relay-bin.000002
-rw-rw---- 1 mysql mysql         117 Apr 13 02:25 dbhost-relay-bin.000003
-rw-rw---- 1 mysql mysql         117 Apr 13 10:45 dbhost-relay-bin.000004
-rw-rw---- 1 mysql mysql        1768 Apr 13 10:45 dbhost-relay-bin.index
lrwxrwxrwx 1 mysql mysql          10 Oct  4  2006 dd -> /mysql/dd/
drwxrwxr-x 2 mysql mysql     3252224 Apr 13 12:30 ddtemp
drwxrwxr-x 2 mysql mysql        4096 Jul 12  2006 ddvast
-rw-r--r-- 1 mysql mysql          31 Jun 19  2006 ibbackup_binlog_info
-rw-r----- 1 mysql mysql 18215862272 Apr 13 12:30 ibdata1
-rw-r----- 1 mysql mysql   524288000 Apr 13 12:30 ib_logfile0
-rw-r----- 1 mysql mysql   524288000 Apr 13 12:30 ib_logfile1
-rw-r--r-- 1 root  root       312605 Mar  1 16:46 log.zip
-rw-rw---- 1 mysql mysql          74 Apr 10 12:37 master.info
drwx--x--x 2 mysql mysql        4096 Apr 13 00:27 mysql
-rw-rw---- 1 mysql mysql   996626413 Apr 12 21:02 mysql-bin.000371
-rw-rw---- 1 mysql mysql    68617586 Apr 13 02:24 mysql-bin.000372
-rw-rw---- 1 mysql mysql   132938389 Apr 13 10:44 mysql-bin.000373
-rw-rw---- 1 mysql mysql    39935096 Apr 13 12:30 mysql-bin.000374
-rw-rw---- 1 mysql mysql         128 Apr 13 10:45 mysql-bin.index
-rw-rw---- 1 mysql mysql       49296 Apr 13 12:30 mysql.err
-rw-rw---- 1 mysql mysql       55088 Apr 12 21:02 mysql.err-old
-rw-rw---- 1 mysql mysql           6 Apr 13 10:45 mysql.pid
-rw-rw---- 1 mysql mysql    11845837 Apr 13 12:24 mysql-slow-query.log
-rw-r----- 1 root  root      3676911 Feb 11 00:39 mysql-slow-query.log_pre34enterpr
-rw-r----- 1 mysql mysql    20578407 Jan 18 09:35 mysql-slow-query.old
srwxrwxrwx 1 mysql mysql           0 Apr 13 10:45 mysql.sock
-rw-rw---- 1 mysql mysql          31 Apr 13 10:45 relay-log.info
drwxr-xr-x 2 mysql mysql        4096 Jan  9 14:37 test
drwxr-xr-x 2 mysql mysql       12288 Apr  4 06:30 tmp

We have a software raid 1+0 setup and we do have a faulty disk. Also deleting the table from our replication server (which runs 5.38 enterprise) succeeds as expected.

sorry, the last entry was for another bug. Please disregard the entry.

I am pretty sure that these two changes fix the problem, if this is a bug in MySQL/InnoDB.

I do not know if this bug is from MySQL closing the table handle in an abnormal order, or is there something else wrong in the hardware/OS/mysqld.

Regards,

Heikki

Queued to 5.0- and 5.1-maint team trees

Pushed into 5.1.21-beta

Pushed into 5.0.48

Noted in 5.0.48, 5.1.21 changelogs.

Several InnoDB assertion failures were corrected.

we had mysql 5.0.45 crashed on us with similar problems pretty frequently some time ago. I checked out the latest mysql code from bitkeeper and it seems to be a little better, but mysql still crashes several times a day. Now the error is:
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
070830  7:07:52InnoDB: Assertion failure in thread 138661376 in file srv0srv.c line 2095
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html
InnoDB: about forcing recovery.
070830  7:07:52 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
---

I can provide more info about the crash.

Updating the synopsis.

This bug often occurs when a complex query calls ::close() while it still holds the adaptive hash latch.

See http://bugs.mysql.com/bug.php?id=20358

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)