Bug #45813 Warning: a long semaphore wait
Submitted: 28 Jun 2009 21:12 Modified: 30 Jun 2009 17:55
Reporter: Nickolay Vinogradov Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.1.34 OS:Linux (SUSE Linux Enterprise Server 10 SP2 (x86_64))
Assigned to: CPU Architecture:Any
Tags: long semaphore wait

[28 Jun 2009 21:12] Nickolay Vinogradov
Description:
MySQL crashed.
From log file:

InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
090629  0:09:33  InnoDB: Assertion failure in thread 1132489024 in file srv/srv0srv.c line 2132
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
090629  0:09:33 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=914
max_threads=2000
threads_connected=914
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 20533127 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x83ae6e]
/usr/sbin/mysqld(handle_segfault+0x322)[0x5b89f2]
/lib64/libpthread.so.0[0x2abc02caac00]
/lib64/libc.so.6(gsignal+0x35)[0x2abc0328ebb5]
/lib64/libc.so.6(abort+0x110)[0x2abc0328ffb0]
/usr/sbin/mysqld[0x7b6c31]
/lib64/libpthread.so.0[0x2abc02ca3143]
/lib64/libc.so.6(__clone+0x6d)[0x2abc0331f8cd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

How to repeat:
Unknown. Please look at innodb monitor output.
[28 Jun 2009 21:16] Nickolay Vinogradov
InnoDB Monitor Output

Attachment: Innodb_Montor.log (application/octet-stream, text), 140.63 KiB.

[28 Jun 2009 21:22] Nickolay Vinogradov
Hmmm. it's very interesting: "Thread 1168980288 has waited Thread 1168980288". Is it OK?

--Thread 1168980288 has waited at row/row0mysql.c line 1720 for 247.00 seconds the semaphore:
X-lock on RW-latch at 0xc73b20 created in file dict/dict0dict.c line 728
a writer (thread id 1168980288) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1

---TRANSACTION 11 2785646909, ACTIVE 247 sec, process no 8320, OS thread id 1168980288 inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
ROLLING BACK 2 lock struct(s), heap size 368, 1 row lock(s)
MySQL thread id 212786, query id 792903208 10.0.0.6 spaces update
INSERT INTO spaces_blacklist SET user_id='384522',enemy_id='899930'
[29 Jun 2009 8:29] Valeriy Kravchuk
Thank you for the problem report. Please, send your my.cnf. How many CPUs/cores do you have?
[29 Jun 2009 9:41] Nickolay Vinogradov
my.cnf

Attachment: my.cnf (application/octet-stream, text), 20.73 KiB.

[29 Jun 2009 9:45] Nickolay Vinogradov
8 cores (2 x Xeon E5405)
Mem: 32G
[29 Jun 2009 10:05] Valeriy Kravchuk
I think this is related to large number of active InnoDB threads and/or purge activity. If the problem is easily repeatable, can you, please, try to use different values for the following parameters:

innodb_max_purge_lag = 100000
innodb_max_purge_lag = 1000000
innodb_max_purge_lag = 10000000

and smaller thread concurrency:

innodb_thread_concurrency = 16
[29 Jun 2009 11:42] Nickolay Vinogradov
The problem is NOT easily repeatable.
This is production server, under heavy load.
[29 Jun 2009 11:57] Valeriy Kravchuk
We had similar bug reports in the past (see bug #45070, for example). They are hardly repeatable and are related to high concurrent load. So, I'd say it's more a matter of server tuning than a bug by itself. Assertion and resulting crash are intended and were added by InnoDB to prevent hangs and force DBA to pay attention to server's tuning for the load.
[29 Jun 2009 13:49] Valeriy Kravchuk
Isn't it a duplicate of your other open bug report, http://bugs.mysql.com/bug.php?id=44841, actually?
[29 Jun 2009 16:10] Nickolay Vinogradov
One thread deadlock himself, a you call it problem with the server tuning?
[29 Jun 2009 16:12] Nickolay Vinogradov
No, it's not a DUP, because now a thread deadlock himself.
[29 Jun 2009 16:44] Valeriy Kravchuk
In that older bug you had similar situation with thread "deadlocking" itself, see initial description:

--Thread 1173395776 has waited at row/row0mysql.c line 1720 for 250.00 seconds the semaphore:
X-lock on RW-latch at 0xc73b20 created in file dict/dict0dict.c line 728
a writer (thread id 1173395776) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file row/row0mysql.c line 1685
Last time write locked in file row/row0mysql.c line 1720

and it was doing ROLLBACK:

---TRANSACTION 9 1144471130, ACTIVE 250 sec, process no 10361, OS thread id 1173395776 inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
ROLLING BACK 2 lock struct(s), heap size 368, 1 row lock(s)

That's why I consider this report a duplicate. Please, check.
[29 Jun 2009 17:07] Nickolay Vinogradov
OK, it's DUP.
But why you call it 'tuning problem' ?
[30 Jun 2009 17:55] Valeriy Kravchuk
So, this is a duplicate of bug #44841. Let's continue discussion there...