MySQL Bugs: #47183: long semaphore wait terminates server

Bug #47183	long semaphore wait terminates server
Submitted:	8 Sep 2009 1:45	Modified:	29 Apr 2011 22:59
Reporter:	Andrew Dalgleish	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.1.30	OS:	Linux
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
A customer reported server crash.

From the supplied error log, it looks like a semaphore can have a long wait while the server is committing a large transaction.

Waiting > 240 seconds starts the monitor output, and > 600 seconds will terminate the server (srv/srv0src.c line 2122 in 5.1.30).

How to repeat:
Unknown - possibly commit a very large transaction on a slow disk system?

Suggested fix:
Can the 240s and 600s timeouts be made configurable?

Verified as described

Bug #45988 is a duplicate of this bug here

not verified, no repeatable test case.
Regarding:
Can the 240s and 600s timeouts be made configurable?
feature request,not a bug. imo the numbers are quite big already. 600 seconds is a lot for a computer application.
A single row insert of 900 seconds and a commit of 6000 seconds would indicate a problem, most likely an application one. Could you tell us more information about the application and the system performance in general, particularly, if the problem in question was a onetime one or if there is a general lack of performance.

For what do you need a test case?

We made source code analysis here.

You just need to look into srv/srv0src.c line 2122 in 5.1.30.

It is hard coded there.

We could make the timeout configurable. If 600 seconds is too short, make it 7200 seconds (= 2 hours).

This may be a hang. Please post the entire .err log so that we see what InnoDB is doing during this excessive 600 seconds.

We currently have the same bug on mysql 5.1.30 Our server is killed with the following message:

InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.

Is it just me or does anyone else find it odd that a database server is "deliberately crashed" based on a lock issue.

Is the only solution for this to rollback to mysql 4.n? We are aware that there are probably several application issues that are causing this but the fact that we now know MySQL is deliberately crashed if we cock up and us only being human means that we now cannot depend on the database to stay up through thick and thin.

I've seen some other threads ask that the timeout setting be made configurable. While this might provide some relief I would not regard this as a solution. The main Database process should never crash intentionally or otherwise. I know this is a high standard to set and in some cases ie connection flooding, impossible to attain, but I still believe it should be the aim of any database to remain up.

The 600-second watchdog should have nothing to do with user locks (InnoDB table and record locks). It is about InnoDB mutexes and rw-locks, which should never be held for excessive periods of time. For example, whenever InnoDB transfers control to the operating system (e.g., for I/O) or to MySQL (e.g., returning result to client and waiting for next command), it should release all mutexes and rw-locks that are not necessary.

In CHECK TABLE, this timeout is advanced by 2 hours (7200 seconds) in order to fix Bug #2694.

Removing the 600-second watchdog timeout altogether is analogous to shooting the messenger. What we need is to find out why the watchdog is kicking in. Can you obtain a core dump and post stack traces? Something like this:

gdb mysqld core
set height 0
set log on
thr apply all backtrace full
quit

and attach gdb.txt.

I also hit this bug several times  with mysql 5.1.37 default innodb plugin and mysql 5.1.44 innodb plugin 1.0.6.

I currently see the same symptoms when running a specific Random Query Generator (aka RQG, http://forge.mysql.com/wiki/Category:RandomQueryGenerator) test against a specific development branch. I will attach the error log and stacktraces in separate files soon, hoping it will help the investigations.

RQG command line used:

perl runall.pl \
--grammar=conf/optimizer/optimizer_subquery.yy \
--engine=InnoDB \
--basedir=$CODE \
--vardir=$PWD/var-single \
--queries=100K \
--duration=900 \
--threads=1 \
--reporter=Backtrace,Shutdown

Stacktraces from all threads on semaphore wait timeout crash using the RQG.

Attachment: stacktraces_bug47183_rqg.txt (text/plain), 31.08 KiB.

Error log from semaphore wait timeout crash using the RQG.

Attachment: master.err.bug47183_rqg.txt (text/plain), 200.06 KiB.

Stacktraces from all threads on semaphore wait timeout crash using the RQG against a debug build.

Attachment: stacktraces_bug47183_rqg_debug.txt (text/plain), 31.21 KiB.

I logged Bug#56529 for symptoms/causes observed using the RQG as described previously. This is because the cause in this case seems to be related to the yet-to-be-released feature Index Condition Pushdown.

As part of the Triage, I agree with the InnoDB team diagnosis that the actual bug report is at best a feature request, and one that should not be implemented: it requests a workaround, that masks the real cause, not a solution for the problem.

The crash that happens on the "diagnosed" line is an InnoDB sanity assertion that should never be violated.

A sensible way to address the underlying cause is to make InnoDB engine more resilient to an internal failure or develop a more mechanism for getting the necessary diagnostics information in case of severe malfunction as this one than a crash + a coredump.

A configurable timeout is not a solution that is viable long term.

Thus setting risk and effort to E4/R4.
Suggestion to InnoDB team: double the timeout, but keep it hard-coded.

Bug #59733 causes a deadlock, which will be resolved by a long semaphore wait abort. Bug #50723 may cause an "unnecessary" abort during CHECK TABLE, because CHECK TABLE is not releasing latches every now and then. I would not lengthen the timeout, but instead fix the bugs that are triggering the watchdog.

Added to changelog for 5.1.57, 5.5.12, 5.6.3:

The server could halt if InnoDB interpreted a very heavy I/O load for
15 minutes or more as an indication that the server was hung. This
change fixes the logic that measures how long InnoDB threads were
waiting, which formerly could produce false positives.

What change fixed this?

Mark,  this is a duplicate of Oracle bug #11877216
Patch: http://lists.mysql.com/commits/134235

Mark, this is an interesting fix if your experience matches ours that most cases where the watchdog thread trips the server are now false alarms caused by high load rather than real hangs. Very large deletes of old uncached rows in 5.1 are the sort of thing that could cause nuisance trips.

The proximate cause to finding that these are false alarms was where a server with far too small innodb log file size entered async flushing sometimes during large batch jobs. Sometimes the extra i/o load slowed things down enough for the watchdog thread to trip.

I think this may eliminate 90% or more of the cases where we currently see watchdog thread trips.

Mark, the previous (faulty) logic was roughly as follows.

If the longest-waited-for InnoDB mutex or rw-lock has been waited for 600 seconds (10 minutes), increment a counter and wait 30 seconds. If the situation persists for 10 successive checks, kill the server (after 600+30*10 seconds from start of the wait). Otherwise, reset the counter.

The problem with this was that it fails to account for the case where the longest-waited-for mutex or rw-lock is changing. It could be that mutex M1 has been waited for 600 seconds, mutex M2 for 530 seconds, ..., mutex M10 for 330 seconds. If mutex M1 is granted after the watchdog check, on the next round the watchdog would see that mutex M2 has been waited for "too long", and finally mutex M10 has been waited for "too long" and the server is killed without a real reason.

Originally the rule was to start diagnostic printouts at 240 seconds and kill the server at 600 seconds. Several years ago, I relaxed it to kill the server at 600+30*10 seconds. Now I fixed it so that the watchdog remembers the longest-waited-for mutex so that the counter will be reset if the longest-waited-for mutex changes or the longest wait time drops below 600 seconds. The kill time for a real hang should remain at 600+30*10 seconds (15 minutes).

Thanks for the updates. Fortunately, this has not been a problem for us. The only crashes we get are from InnoDB asserting after reading a corrupt page.