MySQL Bugs: #32157: Crash from a long semaphore wait on Innodb checkpoint lock

Bug #32157	Crash from a long semaphore wait on Innodb checkpoint lock
Submitted:	7 Nov 2007 5:32	Modified:	10 Nov 2007 20:02
Reporter:	Mark Callaghan	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.0.37	OS:	Linux
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	checkpoint, crash, innodb, long, mutex, Semaphore, wait

Description:
Innodb crashed MySQL because of a long semaphore wait on log_sys->checkpoint_lock

The initial warning message. The thread is waiting for an S lock on a mutex for which it holds the X lock. I think this is because IO completion by the background IO log thread will trigger a release of the X lock.

InnoDB: Warning: a long semaphore wait:
--Thread 1191237984 has waited at log0log.c line 2026 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x2aaab1c17b78 created in file log0log.c line 810
a writer (thread id 1191237984) has reserved it in mode  exclusive
number of readers 0, waiters flag 1
Last time read locked in file log0log.c line 2026
Last time write locked in file log0log.c line 1821

Status of the log background IO thread, note that 'ev set' is printed

I/O thread 1 state: complete io for log (log thread) reads 0 writes 73567 requests 73567 io secs 11.267639 io msecs/request 0.153162 max_io_wait 5182.570000 ev set

There appears to be pending IO for the log thread:

Pending flushes (fsync) log: 2; buffer pool: 0

There appears to be pending IO for the log thread:

---
LOG
---
Log sequence number 45 3602059595
Log flushed up to   45 3600590787
Last checkpoint at  45 3318244008
1 pending log writes, 1 pending chkp writes
11450964 log i/o's done, 0.00 log i/o's/second, 675754 syncs, 74369 checkpoints
log sync callers: 10043382 buffer pool, background 686763 sync and 205450 async, 315 internal, checkpoint 74369 sync and 0 async, 0 archive, commit
0 sync and 10536449 async
log sync syncers: 2950 buffer pool, background 672803 sync and 0 async, 0 internal, checkpoint 1 sync and 0 async, 0 archive, commit 0 sync and 0 async

The thread mentioned here is also the one for which the long semaphore wait occurs:

--------------
ROW OPERATIONS
--------------
3 queries inside InnoDB, 0 queries in queue
4 read views open inside InnoDB
Main thread process no. 10529, id 1191237984, state: making checkpoint
Number of rows inserted 289403566, updated 44947714, deleted 8597961, read 39518210474
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 4399.23 reads/s

And I just lost ssh access to the host with the problem.

How to repeat:
NA

Suggested fix:
Not a fix, but a question. Is it possible that the pending IO by the background log IO thread will trigger a release of the X lock which is blocking the attempt to get the S lock on log_sys->checkpoint_lock? If that is the case, is the issue that the background log IO thread did not get scheduled to run when there was a pending IO?

This is probably a duplicate of http://bugs.mysql.com/bug.php?id=29560 because we have a waiting X-lock request and 0 readers here.

It is also possible that disk errors lead to this problem. If the server retries the IO long enough, the long wait for checkpoint IO to finish occurs. I have only had this error on one server and I think a hardware error is likely.

I'm seeing these same crashes now in 5.5.  It's happened on multiple machines, so hardware issue seems less likely.

Here is the relevant output (I have more if you'd like to see them).  Also, in each case, there are *no* other transactions open/running:

=====================================
120721  0:04:06 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 59978 1_second, 56279 sleeps, 5485 10_second, 13557 background, 13542 flush
srv_master_thread log flush and writes: 64341
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 26984, signal count 50130
--Thread 139812395484928 has waited at log0log.c line 2049 for 947.00 seconds the semaphore:
S-lock on RW-latch at 0x10f02058 created in file log0log.c line 832
a writer (thread id 139812395484928) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file log0log.c line 2049
Last time write locked in file /root/mysql-5.5.21/storage/innobase/log/log0log.c line 1846
Mutex spin waits 151742, rounds 389600, OS waits 7827
RW-shared spins 31546, rounds 561979, OS waits 14909
RW-excl spins 21049, rounds 206218, OS waits 3968
Spin rounds per wait: 2.57 mutex, 17.81 RW-shared, 9.80 RW-excl
------------
TRANSACTIONS
------------
Trx id counter F73573
Purge done for trx's n:o < F731D8 undo n:o < 0
History list length 388
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION F73572, not started
MySQL thread id 55825, OS thread handle 0x7f2899323700, query id 12272988 Reading event from the relay log
---TRANSACTION 0, not started
MySQL thread id 152, OS thread handle 0x7f289a26b700, query id 12273008 P-DATAMINING-S-112-1.voltalis.com 172.17.112.1 monyoguser executing
SHOW /*!50002 GLOBAL*/ STATUS
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: complete io for log (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 1; buffer pool: 0
107293 OS file reads, 2488278 OS file writes, 96907 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1560, seg size 1562, 18659 merges
merged operations:
 insert 105817, delete mark 100182, delete 2458
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 18701467, node heap has 5285 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 445540058910
Log flushed up to   445534035089
Last checkpoint at  445533435097
0 pending log writes, 1 pending chkp writes
391592 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 8688238592; in additional pool allocated 0
Dictionary memory allocated 681653
Buffer pool size   518143
Free buffers       346409
Database pages     166449
Old database pages 61423
Modified db pages  9088
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 967, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 107152, created 59297, written 2049526
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 166449, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 29876, id 139812395484928, state: making checkpoint
Number of rows inserted 8105048, updated 12533281, deleted 747788, read 4199787701
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
120721  0:04:13  InnoDB: Assertion failure in thread 139812378699520 in file srv0srv.c line 2455
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
22:04:13 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=2147483648
read_buffer_size=2097152
max_used_connections=57
max_threads=151
thread_count=57
connection_count=57
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2717370 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/local/mysql/bin//mysqld(my_print_stacktrace+0x35)[0x7b4c45]
/usr/local/mysql/bin//mysqld(handle_fatal_signal+0x4a4)[0x686484]
/lib/libpthread.so.0(+0xeff0)[0x7f2b351eeff0]
/lib/libc.so.6(gsignal+0x35)[0x7f2b340d61b5]
/lib/libc.so.6(abort+0x180)[0x7f2b340d8fc0]
/usr/local/mysql/bin//mysqld[0x84b15c]
/lib/libpthread.so.0(+0x68ca)[0x7f2b351e68ca]
/lib/libc.so.6(clone+0x6d)[0x7f2b3417392d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
120721 00:37:16 mysqld_safe Number of processes running now: 0
120721 00:37:16 mysqld_safe mysqld restarted
120721  0:37:17 [Note] Plugin 'FEDERATED' is disabled.
120721  0:37:17 InnoDB: The InnoDB memory heap is disabled
120721  0:37:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120721  0:37:17 InnoDB: Compressed tables use zlib 1.2.3
120721  0:37:17 InnoDB: Using Linux native AIO
120721  0:37:17 InnoDB: Initializing buffer pool, size = 7.9G
120721  0:37:17 InnoDB: Completed initialization of buffer pool
120721  0:37:17 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 445533470024
120721  0:37:18  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
...

I am getting the same bug in MySQL 5.5.27. Twice in the last week.
I doubt there's a hardware problem as well but wondering it's to do with multi cpu (we have 48). The server gets stuck with metadata locks a lot - we've had to severely restrict the average number of simulataneous queries (< 12) to let it cope

ps We are running Mandriva 2011.0 Linux 64 bit Kernel 2.6.38.7-desktop-1mnb2#1 SMP .

48 Core server with 128GB ram. On crashing, the log is suggesting that we are only using up to 70GB, but that wouldn't take MyISAM into account.

Our queries per second are between 5000 and 8500 normally, though we think it would be 20,000+ if it weren't for the metadata locks

Logs

Attachment: mysql_crash_20130328.txt (text/plain), 5.49 KiB.

Crashed my server on the same line today.
in file srv0srv.c line 2455

mysqld  Ver 5.5.29-0ubuntu0.12.04.2-log for debian-linux-gnu on x86_64 ((Ubuntu))

Attached are my logs

Checked the server's memory, nothing's wrong.

But it's good that the crash is there, it looks like things on the server were being held up by this lock.  If it didn't crash here, it would have locked forever.

Here is another person with the same issue...
http://www.mysteryzillion.org/discussion/11992/crash-recovery-on-mysql-5-5-22/p1&ei=z_xTUd...