Bug #32157 Crash from a long semaphore wait on Innodb checkpoint lock
Submitted: 7 Nov 2007 6:32 Modified: 10 Nov 2007 21:02
Reporter: Mark Callaghan
Status: Duplicate
Category:Server: InnoDB Severity:S3 (Non-critical)
Version:5.0.37 OS:Linux
Assigned to: Heikki Tuuri Target Version:
Tags: mutex, checkpoint, crash, wait, Semaphore, long, innodb

[7 Nov 2007 6:32] Mark Callaghan
Description:
Innodb crashed MySQL because of a long semaphore wait on log_sys->checkpoint_lock

The initial warning message. The thread is waiting for an S lock on a mutex for which it
holds the X lock. I think this is because IO completion by the background IO log thread
will trigger a release of the X lock.

InnoDB: Warning: a long semaphore wait:
--Thread 1191237984 has waited at log0log.c line 2026 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x2aaab1c17b78 created in file log0log.c line 810
a writer (thread id 1191237984) has reserved it in mode  exclusive
number of readers 0, waiters flag 1
Last time read locked in file log0log.c line 2026
Last time write locked in file log0log.c line 1821

Status of the log background IO thread, note that 'ev set' is printed

I/O thread 1 state: complete io for log (log thread) reads 0 writes 73567 requests 73567
io secs 11.267639 io msecs/request 0.153162 max_io_wait 5182.570000 ev set

There appears to be pending IO for the log thread:

Pending flushes (fsync) log: 2; buffer pool: 0

There appears to be pending IO for the log thread:

---
LOG
---
Log sequence number 45 3602059595
Log flushed up to   45 3600590787
Last checkpoint at  45 3318244008
1 pending log writes, 1 pending chkp writes
11450964 log i/o's done, 0.00 log i/o's/second, 675754 syncs, 74369 checkpoints
log sync callers: 10043382 buffer pool, background 686763 sync and 205450 async, 315
internal, checkpoint 74369 sync and 0 async, 0 archive, commit
0 sync and 10536449 async
log sync syncers: 2950 buffer pool, background 672803 sync and 0 async, 0 internal,
checkpoint 1 sync and 0 async, 0 archive, commit 0 sync and 0 async

The thread mentioned here is also the one for which the long semaphore wait occurs:

--------------
ROW OPERATIONS
--------------
3 queries inside InnoDB, 0 queries in queue
4 read views open inside InnoDB
Main thread process no. 10529, id 1191237984, state: making checkpoint
Number of rows inserted 289403566, updated 44947714, deleted 8597961, read 39518210474
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 4399.23 reads/s

And I just lost ssh access to the host with the problem.

How to repeat:
NA

Suggested fix:
Not a fix, but a question. Is it possible that the pending IO by the background log IO
thread will trigger a release of the X lock which is blocking the attempt to get the S
lock on log_sys->checkpoint_lock? If that is the case, is the issue that the background
log IO thread did not get scheduled to run when there was a pending IO?
[7 Nov 2007 14:35] Heikki Tuuri
This is probably a duplicate of http://bugs.mysql.com/bug.php?id=29560 because we have a
waiting X-lock request and 0 readers here.
[10 Nov 2007 21:02] Mark Callaghan
It is also possible that disk errors lead to this problem. If the server retries the IO
long enough, the long wait for checkpoint IO to finish occurs. I have only had this error
on one server and I think a hardware error is likely.