Bug #47183 long semaphore wait terminates server
Submitted: 8 Sep 3:45 Modified: 15 Oct 18:42
Reporter: Andrew Dalgleish
Status: Verified
Category:Server: InnoDB Severity:S2 (Serious)
Version:5.1.30 OS:Linux
Assigned to: Marko Mäkelä Target Version:
Triage: Triaged: D5 (Feature request)

[8 Sep 3:45] Andrew Dalgleish
Description:
A customer reported server crash.

From the supplied error log, it looks like a semaphore can have a long wait while the
server is committing a large transaction.

Waiting > 240 seconds starts the monitor output, and > 600 seconds will terminate the
server (srv/srv0src.c line 2122 in 5.1.30).

How to repeat:
Unknown - possibly commit a very large transaction on a slow disk system?

Suggested fix:
Can the 240s and 600s timeouts be made configurable?
[8 Sep 7:30] Susanne Ebrecht
Verified as described
[8 Sep 11:32] Susanne Ebrecht
Bug #45988 is a duplicate of this bug here
[8 Sep 18:16] Michael Izioumtchenko
not verified, no repeatable test case.
Regarding:
Can the 240s and 600s timeouts be made configurable?
feature request,not a bug. imo the numbers are quite big already. 600 seconds is a lot
for a computer application.
A single row insert of 900 seconds and a commit of 6000 seconds would indicate a problem,
most likely an application one. Could you tell us more information about the application
and the system performance in general, particularly, if the problem in question was a
onetime one or if there is a general lack of performance.
[17 Sep 9:27] Susanne Ebrecht
For what do you need a test case?

We made source code analysis here.

You just need to look into srv/srv0src.c line 2122 in 5.1.30.

It is hard coded there.
[17 Sep 15:41] Heikki Tuuri
We could make the timeout configurable. If 600 seconds is too short, make it 7200 seconds
(= 2 hours).
[17 Sep 15:43] Heikki Tuuri
This may be a hang. Please post the entire .err log so that we see what InnoDB is doing
during this excessive 600 seconds.
[25 Nov 11:28] Harry Jackson
We currently have the same bug on mysql 5.1.30 Our server is killed with the following
message:

InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.

Is it just me or does anyone else find it odd that a database server is "deliberately
crashed" based on a lock issue.

Is the only solution for this to rollback to mysql 4.n? We are aware that there are
probably several application issues that are causing this but the fact that we now know
MySQL is deliberately crashed if we cock up and us only being human means that we now
cannot depend on the database to stay up through thick and thin.

I've seen some other threads ask that the timeout setting be made configurable. While
this might provide some relief I would not regard this as a solution. The main Database
process should never crash intentionally or otherwise. I know this is a high standard to
set and in some cases ie connection flooding, impossible to attain, but I still believe
it should be the aim of any database to remain up.
[25 Nov 11:46] Marko Mäkelä
The 600-second watchdog should have nothing to do with user locks (InnoDB table and record
locks). It is about InnoDB mutexes and rw-locks, which should never be held for excessive
periods of time. For example, whenever InnoDB transfers control to the operating system
(e.g., for I/O) or to MySQL (e.g., returning result to client and waiting for next
command), it should release all mutexes and rw-locks that are not necessary.

In CHECK TABLE, this timeout is advanced by 2 hours (7200 seconds) in order to fix Bug
#2694.

Removing the 600-second watchdog timeout altogether is analogous to shooting the
messenger. What we need is to find out why the watchdog is kicking in. Can you obtain a
core dump and post stack traces? Something like this:

gdb mysqld core
set height 0
set log on
thr apply all backtrace full
quit

and attach gdb.txt.