Bug #20358 InnoDB hang on the adaptive hash index latch in btr0sea.c
Submitted: 9 Jun 2006 13:31 Modified: 12 Jan 2011 23:02
Reporter: Heikki Tuuri Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:4.1.12, 4.1.20,5.0.17,5.0.44,5.0.58,5.0.60 OS:Any (64-bit Linux-2.4 and 2.6)
Assigned to: James Day CPU Architecture:Any

[9 Jun 2006 13:31] Heikki Tuuri
Description:
All threads inside InnoDB end up waiting for &btr_search_latch.

There is one reader there who never releases his S-latch.

A 'long semaphore wait' makes InnoDB to crash mysqld voluntarily.

How to repeat:
Do not know.

Suggested fix:
Need to add assertions to catch this bug earlier.
[9 Jun 2006 15:24] Heikki Tuuri
This may be associated with http://bugs.mysql.com/bug.php?id=19081 if there is a bug in sync0*.
[12 Jun 2006 13:29] Heikki Tuuri
A new similar hang report from 4.1.18.

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1357348581, signal count 808329520
--Thread 1211967840 has waited at btr0sea.c line 480 for 710.00 seconds the semaphore:
X-lock on RW-latch at 0x2a9e76ccb8 created in file btr0sea.c line 139
a writer (thread id 1211967840) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 745
Last time write locked in file btr0cur.c line 1404
--Thread 1281812832 has waited at row0sel.c line 3099 for 710.00 seconds the semaphore:
S-lock on RW-latch at 0x2a9e76ccb8 created in file btr0sea.c line 139
a writer (thread id 1211967840) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 745
Last time write locked in file btr0cur.c line 1404
--Thread 1147140448 has waited at btr0cur.c line 1404 for 710.00 seconds the semaphore:
X-lock on RW-latch at 0x2a9e76ccb8 created in file btr0sea.c line 139
a writer (thread id 1211967840) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 745
Last time write locked in file btr0cur.c line 1404
--Thread 1264752992 has waited at btr0cur.c line 1404 for 710.00 seconds the semaphore:
X-lock on RW-latch at 0x2a9e76ccb8 created in file btr0sea.c line 139
a writer (thread id 1211967840) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 745
Last time write locked in file btr0cur.c line 1404

...
[12 Jun 2006 14:25] Heikki Tuuri
Both hangs are from a 64-bit Linux:

2.6.5-7.252-smp #1 SMP Tue Feb 14 11:11:04 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux 

2.4.21-37.ELsmp #1 SMP Wed Sep 7 13:32:18 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
[12 Jun 2006 16:10] Heikki Tuuri
This is mysterious. I checked the 4.1.18 source code. We set an S-latch on btr_search_latch in a few places in btr0sea.c, but release it in all cases.

If trx->has_search_latch is non-zero, then we set an S-latch in row0sel.c and btr0cur.c, but in the printout from 4.1.18 no transaction has that flag set.

InnoDB's internal SQL interpreter does in some cases set an S-latch in row0sel.c, but the code there seems to be ok, and furthermore, no S-latch should be set there anyway because in 4.1 InnoDB's internal interpreter does all the reads with locking reads.

This problem might have something to do with 64-bit hardware. In 2 months two hang reports pop up from 64-bit Linux computers. 4.1.12 was released a year ago. Why do we get hang reports only now?
[12 Jun 2006 20:10] Heikki Tuuri
Google Groups search did not find any relevant changes in btr0*, sync0rw, row0sel.c in 4.1 in the past 18 months.
[27 Jul 2006 16:37] Heikki Tuuri
Putting this to the 'Can't repeat' state until we get more reports of this hang. The primary suspect currently is a hardware problem in 4-socket Opteron setups.
[23 Feb 2007 11:49] Nils Juenemann
Same Problem on Linux 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:24:31 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux with MySQL 5.0.22

OS WAIT ARRAY INFO: reservation count 102715884, signal count 90051532
--Thread 1466497376 has waited at btr0sea.c line 489 for 247.00 seconds the semaphore:
X-lock on RW-latch at 0x2a963b72b8 created in file btr0sea.c line 139
a writer (thread id 1466497376) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row0sel.c line 3319
Last time write locked in file btr0sea.c line 489
--Thread 1287584096 has waited at row0sel.c line 3319 for 247.00 seconds the semaphore:
S-lock on RW-latch at 0x2a963b72b8 created in file btr0sea.c line 139
a writer (thread id 1466497376) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row0sel.c line 3319
Last time write locked in file btr0sea.c line 489
--Thread 1216498016 has waited at row0sel.c line 3319 for 247.00 seconds the semaphore:
S-lock on RW-latch at 0x2a963b72b8 created in file btr0sea.c line 139
a writer (thread id 1466497376) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row0sel.c line 3319
Last time write locked in file btr0sea.c line 489
--Thread 1289714016 has waited at row0sel.c line 3319 for 247.00 seconds the semaphore:
S-lock on RW-latch at 0x2a963b72b8 created in file btr0sea.c line 139
a writer (thread id 1466497376) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row0sel.c line 3319
Last time write locked in file btr0sea.c line 489
--Thread 1233803616 has waited at row0sel.c line 3319 for 247.00 seconds the semaphore:
S-lock on RW-latch at 0x2a963b72b8 created in file btr0sea.c line 139
a writer (thread id 1466497376) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row0sel.c line 3319
Last time write locked in file btr0sea.c line 489

[..]

Last time read locked in file row0sel.c line 3319
Last time write locked in file btr0sea.c line 489
Mutex spin waits 13609520290, rounds 42376139135, OS waits 60100204
RW-shared spins 24335631, OS waits 5623106; RW-excl spins 38994315, OS waits 2273361
[30 Aug 2007 21:12] Igor Green
same problem here. OS - FreeBSD 6.2. on a 4-way dell box. 
happens to mysql version 5.0.45 and 5.0.50-beta

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 120453, signal count 100913
--Thread 796863232 has waited at btr0sea.c line 489 for 64.00 seconds the semaphore:
X-lock on RW-latch at 0xa4e0a60 created in file btr0sea.c line 139
a writer (thread id 796863232) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 746
Last time write locked in file btr0sea.c line 489
--Thread 804406528 has waited at ./../include/sync0rw.ic line 361 for 64.00 seconds the semaphore:
Mutex at 0xa4e0a74 created file btr0sea.c line 139, lock var 0
waiters flag 0
--Thread 819695616 has waited at btr0sea.c line 746 for 64.00 seconds the semaphore:
S-lock on RW-latch at 0xa4e0a60 created in file btr0sea.c line 139
a writer (thread id 796863232) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 746
Last time write locked in file btr0sea.c line 489
--Thread 792090368 has waited at btr0cur.c line 1484 for 64.00 seconds the semaphore:
X-lock on RW-latch at 0xa4e0a60 created in file btr0sea.c line 139
a writer (thread id 796863232) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 746
Last time write locked in file btr0sea.c line 489
--Thread 818813184 has waited at btr0sea.c line 1127 for 64.00 seconds the semaphore:
S-lock on RW-latch at 0xa4e0a60 created in file btr0sea.c line 139
a writer (thread id 796863232) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 746
Last time write locked in file btr0sea.c line 489

<and so on>
[5 Sep 2007 3:02] Arjen Lentz
Hi Heikki - any progress on this problem?
It's been open since 2006, and is quite a serious problem.
[5 Sep 2007 3:14] Jeremy Cole
Hi Guys,

Looks like I saw this bug today.

Here's the semaphores that seem to be involved, I've added the InnoDB function names and variable names to clarify the file and line numbers:

--Thread 1159866720 has waited at dict0dict.c line 298 (in dict_table_decrement_handle_count) for 258.00 seconds the semaphore:
Mutex at 0x2aa6adb2b8 (dict_sys->mutex) created file dict0dict.c line 709 (in dict_init), lock var 1
waiters flag 1

--Thread 1158461792 has waited at btr0sea.c line 1383 (btr_search_update_hash_on_delete) for 258.00 seconds the semaphore:
X-lock on RW-latch at 0x2aa6ad20b8 (in btr_search_latch) created in file btr0sea.c line 139 (in btr_search_sys_create)
a writer (thread id 1158461792) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 746 (in btr_search_guess_on_hash)
Last time write locked in file btr0sea.c line 1529 (in btr_search_update_hash_on_insert)

Here are the transactions involved:

---TRANSACTION 0 1760648916, ACTIVE 258 sec, process no 15781, OS thread id 1158461792 inserting, thread declared inside InnoDB 438
mysql tables in use 1, locked 1
ROLLING BACK 14 lock struct(s), heap size 3024, undo log entries 4
MySQL thread id 3, query id 50083323 update
insert into proxy_prefs (userid, item_id, rating) values
                   (23107341,13442882,3.5), ... <many rows snipped> ...
on duplicate key update rating = if(proxy_prefs.rating > values(rating),                 proxy_prefs.rating, values(rating));

---TRANSACTION 0 1760648698, ACTIVE 258 sec, process no 15781, OS thread id 1159866720, thread declared inside InnoDB 495
mysql tables in use 3, locked 0
, holds adaptive hash latch
MySQL thread id 18954, query id 50083043 192.168.110.138 reader Sending data
select count(item_similarity.item_id_l) as count from item_similarity, item where (item_similarity.item_id_l=item.id or item_similarity.item_id_r=item.id) and item.publisher_id=1237
Trx read view will not see trx with id >= 0 1760648699, sees < 0 1760648699

I'm not sure if thread 1159866720 is actually materially involved, but it had been waiting for the exact same amount of time, so I've included it.  It might be an innocent bystander.

Additionally, note that thread 1158461792 is rolling back, and is stuck trying to get the lock to delete records, which it itself holds according to the above output and is corroborated by subsequent prints from other threads claiming that 1158461792 holds the lock.

A related question: Why doesn't rw_lock_x_lock assert immediately if you already own the lock instead of forcing a 10-minute deadlock and then assert anyway?

Regards,

Jeremy
[5 Sep 2007 3:17] Jeremy Cole
To clarify: I saw this on 5.0.45.
[5 Sep 2007 14:18] Heikki Tuuri
When you report this error, please include at least 1 MB of the .err log. I need to see several full printouts of the diagnostic information.

I am now looking at Kuraev's .err log. There, someone is holding the rollback segment mutex indefinitely, and someone is holding an S-latch on the btr0sea latch indefinitely. All semaphore waits are on those two.

Looking at trx0undo.c, a thread holding the rollback segment mutex could end up waiting for some page latch, or an fsp latch, but I do not see how it could end up waiting for the btr0sea latch! Has the thread holding the rollback segment mutex simply died?
[5 Sep 2007 14:48] Heikki Tuuri
Jeremy,

you have hit the ::close() hang bug which was fixed in 5.0.48:

http://bugs.mysql.com/bug.php?id=25645

Regards,

Heikki
[5 Sep 2007 15:00] Heikki Tuuri
The ::close() hang bug happens when the thread which 'has the adaptive hash latch' does this wait:

--Thread 1159866720 has waited at dict0dict.c line 298 (in
dict_table_decrement_handle_count) for 258.00 seconds the semaphore:
Mutex at 0x2aa6adb2b8 (dict_sys->mutex) created file dict0dict.c line 709 (in dict_init),
[5 Sep 2007 15:05] Heikki Tuuri
Another mystery in Kuraev's .err log is that no thread says it 'holds the adaptive hash latch', though the latch has a reader count of 1! This suggests a hardware problem or memory corruption by InnoDB.
[7 Sep 2007 16:27] Mark Leith
thread that is deadlocking with itself:

--Thread 1159264608 has waited at btr0sea.c line 489 for 242.00 seconds the semaphore:
X-lock on RW-latch at 0x2aaab22420b8 created in file btr0sea.c line 139
a writer (thread id 1159264608) has reserved it in mode  wait exclusive
number of readers 0, waiters flag 1
Last time read locked in file row0sel.c line 3326
Last time write locked in file btr0sea.c line 489

Read lock within row_search_for_mysql():

			/* This is a SELECT query done as a consistent read,
			and the read view has already been allocated:
			let us try a search shortcut through the hash
			index.
			NOTE that we must also test that
			mysql_n_tables_locked == 0, because this might
			also be INSERT INTO ... SELECT ... or
			CREATE TABLE ... SELECT ... . Our algorithm is
			NOT prepared to inserts interleaved with the SELECT,
			and if we try that, we can deadlock on the adaptive
			hash index semaphore! */

#ifndef UNIV_SEARCH_DEBUG			
			if (!trx->has_search_latch) {
				rw_lock_s_lock(&btr_search_latch);
				trx->has_search_latch = TRUE;
			}

Write lock within btr_search_info_update_slow():

	if (cursor->flag == BTR_CUR_HASH_FAIL) {
		/* Update the hash node reference, if appropriate */

		btr_search_n_hash_fail++;

		rw_lock_x_lock(&btr_search_latch);

		btr_search_update_hash_ref(info, block, cursor);

		rw_lock_x_unlock(&btr_search_latch);
	}
[11 Sep 2007 14:06] Heikki Tuuri
Vasil should add innodb_adaptive_hash_index my.cnf parameter to 5.0 and 5.1. The patch is courtesy of Mark Leith of MySQL AB.

I will continue analyzing the hangs. Some of them are due to the known InnoDB bug in < 5.0.48, some look like hardware problems.
[11 Sep 2007 14:16] Heikki Tuuri
The pending patch is only the patch to add the my.cnf option to switch off adaptive hash indexes, not that we would have found the bugs!
[20 Sep 2007 4:43] Yasufumi Kinoshita
Heikki,

This is not mystery at all.
I have encountered the same problems several times too.
Event handling misses cause this kind of problems.

Bug#29560
[24 Sep 2007 15:36] Ed Lucero
I'm having the same problem.

Hardware: Dell 1850 Dual Xeon 2.8Mhz 2G Memory
FreeBSD 6.0

Ver 5.0.45 for portbld-freebsd6.0 on i386 (FreeBSD port: mysql-server-5.0.45)

What is the status of this bug? Is there a patch or work around?

Ed
[24 Sep 2007 15:42] Heikki Tuuri
Ed,

please post one full output from your .err log.

The only fixed hang bug is this:
http://bugs.mysql.com/bug.php?id=25645

The rest remain a mystery.

Regards,

Heikki
[24 Sep 2007 21:27] Ed Lucero
Error log

Attachment: mysql-error-log.txt (text/plain), 74.77 KiB.

[24 Sep 2007 21:29] Ed Lucero
My.cnf

#
# $Header: /usr/local/ic/config/files/RCS/my.cnf,v 1.3 2007/09/21 04:29:13 elucero Exp $
#
# IC MySql Configuration File
#
# Example MySQL config file for very large systems.
#
# This is for a large system with memory of 1G-2G where the system runs mainly
# MySQL.
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is /var/db/mysql) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.

# The following options will be passed to all MySQL clients
[client]
#password       = your_password
port            = 3306
socket          = /tmp/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
skip-locking
key_buffer = 384M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8
max_connections = 250

# Logging
log-error = /var/log/mysql/db2
log = /usr/mysql/db2-query.log
log-bin = /var/mysql/db2-bin
expire_logs_days = 7

# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id       = 1

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#    the syntax is:
#
#    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
#    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
#    where you replace <host>, <user>, <password> by quoted strings and
#    <port> by the master's port number (3306 by default).
#
#    Example:
#
#    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#    MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#    start replication for the first time (even unsuccessfully, for example
#    if you mistyped the password in master-password and the slave fails to
#    connect), the slave will create a master.info file, and any later
#    change in this file to the variables' values below will be ignored and
#    overridden by the content of the master.info file, unless you shutdown
#    the slave server, delete master.info and restart the slaver server.
#    For that reason, you may want to leave the lines below untouched
#    (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id       = 2
#
# The replication master for this slave - required
#master-host     =   <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user     =   <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port     =  <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin

# Point the following paths to different dedicated disks
#tmpdir         = /tmp/
#log-update     = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
#bdb_cache_size = 384M
#bdb_max_lock = 100000

# Uncomment the following if you are using InnoDB tables
innodb_data_home_dir = /ic/mysql/
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /ic/mysql/
innodb_log_arch_dir = /ic/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 384M
innodb_additional_mem_pool_size = 20M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 100M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout
[24 Sep 2007 21:30] Ed Lucero
myslqd process

/usr/local/libexec/mysqld --defaults-extra-file=/ic/mysql/my.cnf --basedir=/usr/local --datadir=/ic/mysql --user=mysql --pid-file=/ic/mysql/db2.sandiegort.com.pid --port=3306 --socket=/tmp/mysql.sock --log-warnings=2
[26 Sep 2007 22:41] Ed Lucero
Was this helpful? Did you need more?

Ed
[28 Sep 2007 5:08] Yasufumi Kinoshita
Heikki,

I think it may be the glibc's bug.
And it may cause on 32bit systems too.

Bug#29560

Regards,

Yasufumi
[2 Oct 2007 2:44] Yasufumi Kinoshita
Heikki,

Sorry for my jumping to a conclusion.

After all, it may be InnoDB's bug.
(os_event_reset,os_event_set,os_event_wait)
Bug#29560

Regards,

Yasufumi
[2 Oct 2007 16:30] Heikki Tuuri
Yasufumi has found a bug in >= InnoDB-5.0.30 that can explain many of the hangs on the adaptive hash latch:

http://bugs.mysql.com/bug.php?id=29560

The symptom is that there is a waiting X-lock request, but the count of S-locks is zero.
[2 Oct 2007 16:40] Heikki Tuuri
Ed Lucero has probably hit this bug:
http://bugs.mysql.com/bug.php?id=29560
[3 Oct 2007 5:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34798

ChangeSet@1.2508, 2007-10-02 23:47:30-06:00, tsmith@sita.local +4 -0
  Bug #20358: InnoDB hang on the adaptive hash index latch in btr0sea.c
  
  Add --skip-innodb-adaptive-hash-index option, which is a way to
  work around the bug (by disabling the adaptive hash feature entirely).
  
  This may be useful even once the bug is fixed, for benchmarking purposes.
  There are some workloads for which the adaptive hash index is not effective.
[18 Oct 2007 21:34] Bugs System
Pushed into 5.1.23-beta
[18 Oct 2007 21:37] Bugs System
Pushed into 5.0.52
[8 Nov 2007 23:57] Ed Lucero
I added the patch for bug 29560 re-compiled and installed. Im still have problems. I uploaded the log
files. bug-data-1.29560.gz, bug-data-2.29560.gz, bug-data-3.29560.gz

Ed
[9 Nov 2007 22:11] Ed Lucero
Hi All

We have had 11 restarts in the the last 2 days.

What kind of SQL code should I be looking for that will cause this?

Ed

071108 03:47:57  mysqld restarted
071108 07:28:59  mysqld restarted
071108 08:08:00  mysqld restarted
071108 08:08:29  mysqld restarted
071108 11:13:26  mysqld sretarted
071109 01:46:18  mysqld restarted
071109 06:30:10  mysqld restarted
071109 06:51:04  mysqld restarted
071109 07:27:35  mysqld restarted
071109 09:30:36  mysqld restarted
071109 13:20:53  mysqld restarted
[13 Nov 2007 15:05] Heikki Tuuri
Ed,

please attach the .err logs to this bug report, or email them to me if they are < 3 MB.

Regards,

Heikki
[14 Nov 2007 12:31] Heikki Tuuri
Assigning this to Inaam, because he knows InnoDB latches quite well now.

About 50 % of the hangs reported under this bug report are really: http://bugs.mysql.com/bug.php?id=29560

The rest have an unknown cause. I suspect hardware faults and OS bugs are behind some reports.
[14 Nov 2007 12:34] Heikki Tuuri
The option --skip-innodb-adaptive-hash-index is now in 5.0 and 5.1. Setting it may remove hangs in some cases.

Putting this bug report back to the 'Open' status, as we do not know the reason for many of the hangs.
[14 Nov 2007 13:46] Heikki Tuuri
Ed,

what exact FreeBSD version you are running?

Regards,

Heikki
[15 Nov 2007 15:56] Ed Lucero
FreeBSD 6.0
[15 Nov 2007 16:13] Ed Lucero
I tried starting mysqld with the --skip-innodb-adaptive-hash-index option and it gave me this error.

071115  8:00:28 [ERROR] /usr/local/libexec/mysqld: unknown option '--skip-innodb-adaptive-hash-index'

on

/usr/local/libexec/mysqld  Ver 5.0.45 for portbld-freebsd6.0 on i386 (FreeBSD port: mysql-server-5.0.45_1)
[23 Jun 2008 17:47] Heikki Tuuri
Another unexplained hang: http://bugs.mysql.com/bug.php?id=36018
[12 Sep 2008 11:31] Inaam Rana
I have checked three error logs from associated issues and there is a pattern here.

In all these cases I found that the thread holding the search latch ends up waiting for dict_sys->mutex in dict_table_get_and_increment_handle_count() which is called only from ::open(). This is a clear breach of InnoDB's latching order.

Is MySQL allowed to call ::open while holding btr_search_latch I am not sure about (though I doubt that). But we can, in any case, put in a call to innobase_release_temporary_latches() at the start of ::open()
[16 Sep 2008 15:34] Inaam Rana
I have opened Bug#39483 to deal with ::open() hang.
MySQL support engineers for various issues attached with this bug should analyse the .err file as described in Bug#39483 and should associate the respective issues with that bug if they found a ::open() hang.
[1 Dec 2008 19:26] Mikhail Izioumtchenko
setting status to Varified. It would be nice to get MySQL's triage values as well
[3 Dec 2008 16:23] Mikhail Izioumtchenko
reassigning to Inaam, the unassignment was accidental
[19 Jan 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[29 May 2009 2:51] James Day
[28 May 4:46] Mulyadi Pasaribu asked "Why this bug is suspended?" in a comment that I've hidden from public view because it may simply be linkspam.

It's suspended because it was really a collection of different bugs that had symptoms that appeared similar. The real problems were fixed in different, later, bug reports. This one did the useful job of adding the ability to turn off the adaptive hash index.

If you think that this bug report still matters to a problem you're having, first read this guide to identifying the real bug. If you still can't find a relevant bug report, please open a new bug instead of adding to this one. Or just upgrade to a more recent version that has the fixes and speed improvements that they bring included already.

--------

1. Bug #25645: Check for a wait for a mutex created in dict0dict.c in dict_init. If you see this, look to bug #25645 instead and upgrade to version 5.0.48, 5.1.21 or later. Example: 

--Thread 1159866720 has waited at dict0dict.c line 298 (in dict_table_decrement_handle_count) for 258.00 seconds the semaphore: Mutex at 0x2aa6adb2b8 (dict_sys->mutex) created file dict0dict.c line 709 (in dict_init), lock var 1 waiters flag 1 

--------

2. Bug #29560: Check for a waiting X-lock request and 0 readers, adaptive hash index need not be involved. You may also see this as hanging on the adaptive hash rw-lock with 'waiting for an X-lock'. If you see this look to bug #29560 instead and upgrade to version 5.0.54, 5.1.24 and 6.0.4 or later, which solved most adaptive hash hangs (still leaving adaptive hash performance issues, though). Example: 

InnoDB: Warning: a long semaphore wait: --Thread 1191237984 has waited at log0log.c line 2026 for 241.00 seconds the semaphore: S-lock on RW-latch at 0x2aaab1c17b78 created in file log0log.c line 810 a writer (thread id 1191237984) has reserved it in mode exclusive number of readers 0, waiters flag 1 

Example 2: 

S-lock on RW-latch at 5166de8 created in file btr0sea.c line 139 a writer (thread id 1010) has reserved it in mode wait exclusive number of readers 0, waiters flag 1 

--------

3. Bug #32149: Longest-running query or queries include "Copying to tmp table" probably while sorting, adaptive hash index lock still held. Upgrade to 5.0.58, 5.1.24, 6.0.5. Some other causes still unresolved, see that bug for details. 

--------

4. Bug #39483: read that bug for the diagnosis procedure. Upgrade to 5.0.76, 5.1.31 or 6.0.10.

--------

The only reason this bug report is still open is that I haven't finished reviewing every customer issue that ever referenced it to ensure that we've notified people of the fix for their problem. Once I've done that we'll be closing it, opening a new bug for anything that still might not be resolved.

James Day, MySQL Senior Support Engineer, Sun Microsystems
[3 Oct 2009 14:17] Mark Callaghan
Great summary James. I got lost on this report until I reached that. There is one more potential cause -- some 4-core AMD opterons have a HW bug

see http://bugs.mysql.com/bug.php?id=26081

// Opteron Rev E has a bug in which on very rare occasions a locked
// instruction doesn't act as a read-acquire barrier if followed by a
// non-locked read-modify-write instruction.  Rev F has this bug in 
// pre-release versions, but not in versions released to customers,
// so we test only for Rev E, which is family 15, model 32..63 inclusive.

Look at src/base/atomicops-internals-x86.{h,cc} from
http://code.google.com/p/google-perftools